Computing Pi

How well do Patterson's sequential and parallel methods of computing the quantity π/4 perform? Recall that the sequential method was to sum the infinite series for arctangent of one radian, while the parallel method was to compute the probability of a point falling within a unit circle when that point was randomly chosen from the square enclosing the unit circle.

To evaluate the sequential algorithm, let's look at its rate of convergence:

Iterationπ/4πError
01.00004.00000.8584
10.66672.6667-0.4749
20.86673.46670.3251
30.72382.8952-0.2464
40.83493.33970.1981
50.74402.9760-0.1655
60.82093.28370.1421
70.75433.0171-0.1245
80.81313.25240.1108
90.76053.0418-0.0998
100.80813.23230.0907

This rate of convergence is painfully slow: after ten iterations, we do not yet have even two decimal digits of π. A few centuries ago, John Machin derived a much faster series for π/4 based on the following formula:

http://en.wikipedia.org/wiki/John_Machin

Machin's series accumulates about 1.5 decimal digits per iteration:

Iterationπ/4πError
0 .795815899581589958 3.18326359832635983  .04167094473656659
1 .785149257331515078 3.14059702932606031 -.00099562426373292
2 .785405257331258606  3.1416210293250344  .00002837573524118
3 .785397943045544323 3.14159177218217729 -.00000088140761594
4 .785398170601099879 3.14159268240439951  .00000002881460627
5 .785398163153827152 3.14159265261530860 -.00000000097448463
6 .785398163405888690 3.14159265362355476  .00000000003376152
7 .785398163397150557 3.14159265358860222 -.00000000000119100
8 .785398163397458961 3.14159265358983584  .00000000000004260
9 .785398163397447924 3.14159265358979169 -.00000000000000154
10 .785398163397448323 3.14159265358979329  .00000000000000005

So Patterson's choice of series for π/4 falls a bit short of recent practice, to say the least.

But what of his Monte Carlo parallel method? One can clearly build a good parallel program on it by having each of N tasks compute 1/N of the desired number of trials, and then combining their results. Increasing the desired number of trials will drive the parallel communication overhead down to an arbitrarily low level. So scalability of Patterson's Monte Carlo method is nothing short of excellent.

But scalability is only part of the problem to be solved. Raw performance is also critically important.

To analyze this method's performance, consider that the first trial will either be within the unit circle (which gives the estimate π=4) or outside of it (which gives the estimate π=0). The second trial adds the possibility of the estimate π=2. After ten trials, we might possibly have an estimate as good as π=3, or one digit. After one hundred trials, we might (or might not) have an estimate as good as π=3.1.

Continuing this line of thought:

Number of TrialsMaximum Possible Correct Digits of π
10
101
1002
10003
10,0004
100,0005
1,000,0006
10,000,0007
100,000,0008
1,000,000,0009
10,000,000,00010
100,000,000,00011
1,000,000,000,00012
10,000,000,000,00013
100,000,000,000,00014
1,000,000,000,000,00015

Attaining the 15-digit accuracy that Machin's formula delivered after only 10 iterations thus requires at least one quadrillion trials. Therefore, if we had a 1,000-CPU system that could do one trial per nanosecond, it would take more than 15 minutes to compute π to at most 15 places. In contrast, my laptop system took 20 milliseconds to compute this same accuracy using Machin's formula, even with the performance penalty imposed by the bc interpretor. In short, when doing mathematics with parallel computers, you need not only good parallel design, but also good mathematics. Patterson's approach scores quite highly in the parallel-design department, but falls short in the good-mathematics department.

That said, Machin lived centuries ago. What is the state of the art of computing π in 2010?

As of this writing, the effort that has computed the most digits of π on a desktop system used y-cruncher, which, for the record, is a parallel program. The Chudnovsky formula, which delivers 14 decimal digits per iteration, and is based on earlier work by Ramanujan, was used for the computation.

http://en.wikipedia.org/wiki/Numerical_approximations_of_%CF%80#Efficient_methods

So, once again, attaining good performance and scalability requires not just good parallel design, but also good mathematics.