Asymptote Considered Dangerous


                      When N is in a practical range , we should of
                      course be careful not to take such asymptotic
                      estimates too seriously. - Prof. Donald E. Knuth


In his well-regarded and popular text ,
"Prime Numbers and Computer Methods for Factorization" ,
Hans Riesel includes a section titled
"The Number of Prime Factors of Large Numbers" ,
where /Note 1/ , citing /Note 2/ , he states some work from Kac ,
Hardy and Ramanujan , which is the focus of this article :
        [ all emphasis original ]
        the number of prime factors of
        almost all integers about N is between
        -------------------
                (1-eps)ln ln N and (1+eps) ln ln N,
        for every eps > 0.  And
        almost all integers means
        -------------------
        a fraction of all integers
        as close to 1 as we wish.
        ------------------------

In the vicinity where  loge loge N  is about 1 , we have
14 = 2 * 7
15 = 3 * 5
16 = 2^4
This does not seem to hold a high density of numbers in the range
claimed by the statement , i.e. , with exactly 1 prime factor .

So , let's look where  loge loge N  is about 2 .
  omega  OMEGA                  N
      3      4               1602  =  2 * 3^2 * 89
      2      2               1603  =  7 * 229
      2      3               1604  =  2^2 * 401
      3      3               1605  =  3 * 5 * 107
      3      3               1606  =  2 * 11 * 73
      1      1               1607  =  1607
      3      5               1608  =  2^3 * 3 * 67
      1      1               1609  =  1609
      4      4               1610  =  2 * 5 * 7 * 23
      2      3               1611  =  3^2 * 179
      3      4               1612  =  2^2 * 13 * 31
      1      1               1613  =  1613
      3      3               1614  =  2 * 3 * 269
      3      3               1615  =  5 * 17 * 19
      2      5               1616  =  2^4 * 101
      3      4               1617  =  3 * 7^2 * 11
      2      2               1618  =  2 * 809
      1      1               1619  =  1619
      3      7               1620  =  2^2 * 3^4 * 5
      1      1               1621  =  1621
      2      2               1622  =  2 * 811
      2      2               1623  =  3 * 541
      3      5               1624  =  2^3 * 7 * 29
      2      4               1625  =  5^3 * 13
      3      3               1626  =  2 * 3 * 271
      1      1               1627  =  1627
      3      4               1628  =  2^2 * 11 * 37
      2      3               1629  =  3^2 * 181
      3      3               1630  =  2 * 5 * 163
      2      2               1631  =  7 * 233
      3      7               1632  =  2^5 * 3 * 17
      2      2               1633  =  23 * 71
      3      3               1634  =  2 * 19 * 43
Here only 11 of the 33 listed have exactly 2 factors .
Hardly "almost all" .

So , let's look where  loge loge N  is about 3 .
  omega  OMEGA                  N
      6      6          528491310  =  2 * 3 * 5 * 67 * 241 * 1091
      3      3          528491311  =  73 * 691 * 10477
      5      8          528491312  =  2^4 * 41 * 47 * 61 * 281
      4      8          528491313  =  3^2 * 7^4 * 37 * 661
Here only 1 of the 4 listed has exactly 3 factors , but let's look at a
larger sample .
For the 2000 integers nearest  e ^ e^3 , here is the distribution of
omegas :
omega             count
 1                 113
 2                 387
 3                 661
 4                 536
 5                 248
 6                  49
 7                   6
The central tendency is clear ,
but 661 out of 2000 is hardly "almost all" .

So , let's look where  loge loge N  is about 3.5 .
  omega  OMEGA                  N
      2      2    240911788282549  =  284423 * 847019363
      6      7    240911788282550  =  2 * 5^2 * 7 * 11 * 114473 * 546631
      4      5    240911788282551  =  3^2 * 449 * 617 * 96623783
      4      6    240911788282552  =  2^3 * 47 * 139 * 4609516843
None of these 4 listed have nearly 3.5 factors .
For the 8000 integers nearest  e ^ e^3.5 , here is the distribution of
omegas :
omega             count
 1                 243
 2                1095
 3                2142
 4                2281
 5                1442
 6                 597
 7                 167
 8                  30
 9                   3


An extreme case is where  loge loge N  is equal to an integer plus .5 .
This is the case where the interval about the normal value determined by
eps  must be large enough to overlap the two adjacent integers , for
there to be even a theoretical possibility for the statement to be true .
How large must  N  then be ?

If  frac loge loge N  =  .5 ,
then  1  <  ( 1+eps ) loge loge N  -  ( 1-eps ) loge loge N
and  1  <  2 * eps * loge loge N
and  loge loge N  >  .5 / eps .

For  eps = .01 , loge loge N  >  50  and
N  >  e ^ 5184705528587072464087.45  ~=  10 ^ 2251689001358648043629.43

For any reasonable  eps , for the statement to be true ,  N  must be
enormous .
Let alone "for every eps > 0" .
For numbers small enough to be written down in a book or stored in a
computer , the statement is not true .


Asymptotic analysis is very useful because the asymptotic behavior of a
function can provide penetrating insight into the behavior of anything
which it describes .
There is a realm within which the statement is true and
there is another realm within which the statement is not true .
The former realm is large , but hopelessly impractical .
The latter realm is small , but practical .
When we consider Euclid's theorem /Note 3/ ,
"There are an infinite number of primes" , we do not imagine that it
applies to any , necessarily finite , list we might construct .

One of the major applications of asymptotic analysis is in evaluating
the feasibility or scaling of algorithms .

Factorization , a major focus of Riesel's text /Note 1/ , makes an
illustrative case study .
For a large N , about which nothing is known of its factors , the first
line of attack is with trial division .
To find if small factors exist , trial division is highly efficient ,
far faster than any other known method .
But asymptotically , trial division is much slower than other known
algorithms .
Thus , asymptotic analysis suggests that trial division should be
limited .
Pollard's rho and p-1 may also be useful to find any somewhat larger ,
but still relatively small , factors .
The asymptotically faster ECM is a present-day workhorse to find still
larger factors .
Still faster , asymptotically , than ECM is QS , also widely used on
larger numbers .
If N is not too large , e.g. , less than about 200 digits , the
asymptotically fastest practical method known , NFS , may produce a
complete factorization , as is the case with QS , even if both or all
three factors are near  N ^ .5 or N ^ 1/3 , and may do so faster than
ECM or QS .
For very large  N , exhausting multiple iterations of ECM , in the hope
of finding relatively small factors up to about 70 digits , may be the
only practical approach .
Various strategies have been proposed to optimize switching between
these methods and deciding when to test for primality of the remaining
co-factor .
Asymptotic analysis powerfully informs these strategies , but provides
no insight into the relative usefulness of the various algorithms .


Analysis will be informed by asymptotes , but will not be captive to
them .
In practical realms , sometimes they may safely be ignored altogether .


---


I thank Prof. Edsger W. Dijkstra for inspiring the title of this
article /Note 4/.

Herein omega is a substitute for the lower-case Greek letter .
OMEGA is a substitute for the upper-case Greek letter .
eps is a substitute for the lower-case Greek letter , epsilon .

/1/ "Prime Numbers and Computer Methods for Factorization" ,
by Hans Riesel , 2e , Birkhauser , 1994 , pp 157 , 159 .
/2/ "Statistical Independence in Probability Analysis and Number
Theory" , by Mark Kac , 1959 , pp 71-74 , credits Hardy and Ramanujan
with the original statement and its characterization .
Erd"os and Kac proved a related theorem , stated on p. 75 .
/3/ "Elements" , by Euclid , c. 300 B.C.E. , Book IX , Proposition 20 .
/4/ "Go-To Statement Considered Harmful" , by E. W. Dijkstra , Letter to
the Editor , "Communications of the ACM" , 1968 March .
/5/ "Seminumerical Algorithms" , by D. E. Knuth , 3e , Addison-Wesley ,
1998 , p. 399 .

Walter Nissen
2007-10-21