Sunday, June 16, 2019

Graphing frequencies versus distance

If we take the first 10k prime last digits we see that pairs don't occur with the same frequencies as with random 1,3,7,9s. The distances for random numbers also average put about four digits between any pair of 1,3,7,9 but for prime last digits the distances are all different.




Wednesday, June 12, 2019

Frequencies versus distances.

We might expect strong relationship between eg average distance between 1 and  7 (in that order) and the number of times 1,7 (in that order) occurs in last digit tables.

look for pair frequency
-- 351 795 860 380
-- 508 307 716 870
-- 629 696 311 775
-- 899 603 523 365
NR =  9592
Sums are  2386 2401 2411 2390
Proportions for 1,x are: 0.147108 0.333194 0.360436 0.159262
Proportions for 3,x are: 0.211579 0.127863 0.298209 0.362349
Proportions for 7,x are: 0.260888 0.288677 0.128992 0.321443
Proportions for 9,x are: 0.376151 0.252301 0.218828 0.15272
<<< Process finished (PID=18032). (Exit code 0)
================ READY ================

We could start by looking at tables above.

Distances for pairs from random generation of 1,3,7,9.


Wanted to confirm nice, symmetric frequencies for random 1,3,7,9 digits. So, first created a 10k file of random 0,1,2,3. Then changed all the 0s to 7s and all the 2s to 9s to give file  called  randoms10k1379A.txt. {
https://drive.google.com/open?id=1kAbHBenOM6mSXECkgvOiqI9Ftfthrwny
https://pastebin.com/akpWhRdE
}. Then I made a TinyC program, primDist4E.c. { https://pastebin.com/archive/text } to report on  distances between any two different 1,3,7,9 digits in the list. It would output, for instance,  all the 7,x distances in file called distFile7.txt.  Here's an example of that file using just 100 prime digits.

1 2 3 4 To be read in conjunction with primdist4.c
4 2 5 1
1 2 3 5
3 1 4 2
1 2 6 3
1 2 3 4
4 2 3 1
1 7 2 3
3 5 4 1
5 1 2 4
3 1 6 2
2 3 4 1
4 2 6 1
3 1 4 2
1 2 4 11
1 2 3 7
1 5 2 4
11 3 1 2
10 2 4 1
6 1 5 2
1 6 10 2
1 2 3 4
3 5 2 1
1 3 0 2
First line goes 1 2 3 4 which means the first 7 is distance 1 away from 1, 2 away from 2, 3 away from 3 and 4 away from 4. Symetry is just a conincidence.
Note lines 18 and 19 indicate that 7 is a long way away from the first 1 after that.
Should have no zeros except in last lines where list runs out before 1,3,7,9 all found after anchor 7. What about the sum of the rows. Smallest must be like line 1 viz 1+2+3+4 = 10. Corresponds
With consecutive last prime digits of 7 1 3 7 9 (ie ..07 11 13 17 19 ...) in prime number list.

Gawk file below is doStatsOnDistFiles0.gawk  
https://pastebin.com/uFiABNwV

NPP_SAVE: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk
CD: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
Current directory: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
INPUTBOX: "enter file eg distFile1.txt"
local $(INPUT) = distFile1.txt
local $(INPUT[1]) = distFile1.txt
C:\Users\peterb\Desktop\NewSoftware\GnuWin\GetGnuWin32\gnuwin32\bin\gawk -f C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk distFile1.txt
Process started (PID=10148) >>>
Here comes the stats from doStatsOnDistFiles0.gawk
11093 10652 10775 11115
Averages below:
Number of records is 2732
4.0604  is mean of column 1
3.89898  is mean of column 2
3.944  is mean of column 3
4.06845  is mean of column 4
<<< Process finished (PID=10148). (Exit code 0)
================ READY ================

================ READY ================
NPP_SAVE: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk
CD: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
Current directory: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
INPUTBOX: "enter file eg distFile1.txt"
local $(INPUT) = distFile3.txt
local $(INPUT[1]) = distFile3.txt
C:\Users\peterb\Desktop\NewSoftware\GnuWin\GetGnuWin32\gnuwin32\bin\gawk -f C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk distFile3.txt
Process started (PID=6708) >>>
Here comes the stats from doStatsOnDistFiles0.gawk
11344 11090 11253 11104
Averages below:
Number of records is 2806
4.04277  is mean of column 1
3.95225  is mean of column 2
4.01033  is mean of column 3
3.95723  is mean of column 4
<<< Process finished (PID=6708). (Exit code 0)
================ READY ================
NPP_SAVE: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk
CD: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
Current directory: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
INPUTBOX: "enter file eg distFile1.txt"
local $(INPUT) = distFile7.txt
local $(INPUT[1]) = distFile7.txt
C:\Users\peterb\Desktop\NewSoftware\GnuWin\GetGnuWin32\gnuwin32\bin\gawk -f C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk distFile7.txt
Process started (PID=8132) >>>
Here comes the stats from doStatsOnDistFiles0.gawk
11943 10945 11083 11247
Averages below:
Number of records is 2798
4.26841  is mean of column 1
3.91172  is mean of column 2
3.96104  is mean of column 3
4.01966  is mean of column 4
<<< Process finished (PID=8132). (Exit code 0)
================ READY ================
NPP_SAVE: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk
CD: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
Current directory: C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0
INPUTBOX: "enter file eg distFile1.txt"
local $(INPUT) = distFile9.txt
local $(INPUT[1]) = distFile9.txt
C:\Users\peterb\Desktop\NewSoftware\GnuWin\GetGnuWin32\gnuwin32\bin\gawk -f C:\Users\peterb\Desktop\NewSoftware\TinyC\tcc-0.9.27-win32-bin\tcc\Primes0\doStatsOnDistFiles0.gawk distFile9.txt
Process started (PID=7832) >>>
Here comes the stats from doStatsOnDistFiles0.gawk
11161 10837 11074 11094
Averages below:
Number of records is 2757
4.04824  is mean of column 1
3.93072  is mean of column 2
4.01668  is mean of column 3
4.02394  is mean of column 4
<<< Process finished (PID=7832). (Exit code 0)
================ READY ================

The red entry above is the output when doStatsOnDistFiles0.gawk works on distFile9.txt with respect to column 3, ie the distances between 9 and 7.

Note in above all the distances 1,1 and 1,3 and 1,7 .... and 9,9 are all close to 4.0. Not sure why. Expect symmetry but can't calculate expectation.



Prime Distances 2

I took the first 10K (more or less) primes and made a file of the last digits. {
 https://drive.google.com/open?id=1iVTHNs9oeX5LPYQr0VgiaUwUOoXUuHBb
(lastdigit0.txt in google drive) }

Then looked at pairs of digits and counted frequencies. Used gawk script called pairs.txt below.

Below is summary:

Summary of pair frequencies of first 10K (more or less) last digit prime numbers. 

#file below is gawk program called pairs1.txt
BEGIN{ print "look for pair frequency"
previous=1
ar[3,5]=8;
}
{ current = $1
ar[previous,current]++;
#print previous, $1, ar[1,3];  
 previous = current;
 }
 END {
 for(i=0;i<10;i++) {
for(j=0;j<10;j++) {
#print i,j,ar[i,j];
}
 }
 print "--", ar[1,1],ar[1,3],ar[1,7],ar[1,9];
 print "--", ar[3,1],ar[3,3],ar[3,7],ar[3,9];
 print "--", ar[7,1],ar[7,3],ar[7,7],ar[7,9];
 print "--", ar[9,1],ar[9,3],ar[9,7],ar[9,9];
 print "NR = ", NR
  s[1] =ar[1,1]+ar[1,3]+ar[1,7]+ar[1,9];
  s[3] =ar[3,1]+ar[3,3]+ar[3,7]+ar[3,9];
  s[7] =ar[7,1]+ar[7,3]+ar[7,7]+ar[7,9];
  s[9] =ar[9,1]+ar[9,3]+ar[9,7]+ar[9,9];
  print "Sums are ", s[1],s[3],s[7],s[9];
  print "Proportions for 1,x are: " ar[1,1]/s[1],ar[1,3]/s[1], ar[1,7]/s[1],ar[1,9]/s[1]
   print "Proportions for 3,x are: " ar[3,1]/s[3],ar[3,3]/s[3], ar[3,7]/s[3],ar[3,9]/s[3]
    print "Proportions for 7,x are: " ar[7,1]/s[7],ar[7,3]/s[7], ar[7,7]/s[7],ar[7,9]/s[7]
print "Proportions for 9,x are: " ar[9,1]/s[9],ar[9,3]/s[9], ar[9,7]/s[9],ar[9,9]/s[9]
}
---------------output--------------
NPP_SAVE: C:\Users\Dell\Documents\Primes\Distances0\gawkStuff\pairs1.txt
CD: C:\Users\Dell\Documents\Primes\Distances0\gawkStuff
Current directory: C:\Users\Dell\Documents\Primes\Distances0\gawkStuff
INPUTBOX: "Script arguments : "
local $(INPUT) = C:\Users\Dell\Documents\Primes\PrimeLists\lastdigit0.txt
local $(INPUT[1]) = C:\Users\Dell\Documents\Primes\PrimeLists\lastdigit0.txt
Script input arguments, @ARGV : C:\Users\Dell\Documents\Primes\PrimeLists\lastdigit0.txt
"C:\Users\Dell\Desktop\Setup Files\GnuWin\GetGnuWin32\gnuwin32\bin\gawk.exe"    -f  C:\Users\Dell\Documents\Primes\Distances0\gawkStuff\pairs1.txt C:\Users\Dell\Documents\Primes\PrimeLists\lastdigit0.txt
Process started (PID=18032) >>>
look for pair frequency
-- 351 795 860 380
-- 508 307 716 870
-- 629 696 311 775
-- 899 603 523 365
NR =  9592
Sums are  2386 2401 2411 2390
Proportions for 1,x are: 0.147108 0.333194 0.360436 0.159262
Proportions for 3,x are: 0.211579 0.127863 0.298209 0.362349
Proportions for 7,x are: 0.260888 0.288677 0.128992 0.321443
Proportions for 9,x are: 0.376151 0.252301 0.218828 0.15272
<<< Process finished (PID=18032). (Exit code 0)
================ READY ================
Looking at the big red number above, it tells us that 3,7 pair turns up 29.8209% of the total pairs of the form 3,x.  We would expect these numbers to all be 0.25 plus or minus a little bit.
Note the lowest number in each row is 1,1 or 3,3 or 7,7 or 9,9. Conclusion there's a low probability for repetition.