Stupid RCU Tricks: Why Ten Hours?

An RCU bug resulted in about one “near miss” per hour, and I used ten-hour test runs to guide debugging.

So why ten hours?

Stand back! Obtaining this answer will require statistics!!!

And the statistics appropriate for this situation is the celebrated Poisson distribution. If you have a way of calculating the cumulative Poisson distribution, you give it the number of events seen in the test and the number of events expected in that test, and it gives you the probability of that many events (or fewer) occurring due to random chance.

In this case, I am proposing running a ten-hour test that has historically resulted in about one near miss per hour, which means that I would expect ten near misses in my ten-hour test. If I instead see zero near misses, what is the probability that the lack of errors was a fluke, AKA a false negative? We can ask this question of the maxima open-source symbolic-math package as follows:

load(distrib);
bfloat(cdf_poisson(0,10));

The bfloat() converts the answer to floating point, as opposed to the somewhat unhelpful gamma_incomplete_regularized(1, 10) that maxima would otherwise give you. The cdf_poisson(0,10) computes the Poisson cumulative distribution function, which gives the probability of zero or fewer events occuring in a time interval in which ten events would normally occur. This probability turns out to be 4.539992976248485b-5, which means that we should have about 99.995% confidence (AKA “four nines”) that a failure-free ten-hour run is not a false negative.

Of course, statistics cannot prove that the bug went away entirely. For example, if an alleged fix decreased the bug's rate of occurrence by an order of magnitude (as opposed to fixing the bug entirely), then a ten-hour failure-free run would be quite probable, occuring about 37% of the time. This of course means that we should use a longer test to validate any proposed fix.

In this case, I used a 1,000-hour test run, which would be expected to have 1,000 near misses with the bug still in place. Typing bfloat(cdf_poisson(0,1000)); at maxima resulted in 5.075958897549416b-435. In other words, the percentage representing the confidence that such a run is not a fluke has 434 nines, two before the decimal point and 432 after. We could therefore be extremely confident that an error-free 1,000-hour run means that our proposed fix did something.

But we can conclude more. For example, let's suppose that the proposed fix was incomplete, so that it reduced the near-miss rate by only two orders of magnitude. In that case, we would expect ten near misses in the 1,000-hour run, so that the false-negative probability is again computed using bfloat(cdf_poisson(0,10));, meaning that we can be 99.995% confident that an near-miss-free 1,000-hour run indicates that the proposed fix reduced the near-miss rate by at least two orders of magnitude.

In short, although we cannot prove the absence of errors by testing, but we can prove high degrees of confidence in greatly decreased probabilities of occurrence!