TL;DR
This paper introduces a new statistical test based on the arcsine law for evaluating pseudorandom number generators, including an error analysis using a Berry-Essen type inequality to ensure test reliability.
Contribution
It proposes a second-level statistical test for pseudorandom generators based on the arcsine law, with a detailed error approximation framework.
Findings
The test effectively detects weaknesses in pseudorandom generators.
The Berry-Essen inequality provides accurate error bounds for the test.
The method enhances confidence in the statistical evaluation of randomness.
Abstract
Testing the quality of pseudorandom number generators is an important issue. Security requirements become more and more demanding, weaknesses in this matter are simply not acceptable. There is a need for an in-depth analysis of statistical tests -- one has to be sure that rejecting/accepting a generator as good is not a result of errors in computations or approximations. In this paper we propose a second level statistical test based on the arcsine law for random walks. We provide a Berry-Essen type inequality for approximating the arcsine distribution, what allows us to perform a detailed error analysis of the proposed test.
| MS Visual C++ | 0.2093 | 0.0000 |
|---|---|---|
| GNU C | 0.0255 | 0.2389 |
| Minstd 48271 | 0.2089 | 0.0000 |
| MT19937-64 | 0.0252 | 0.2523 |
| PRNG\Test | Statistic H | Statistic M | Statistic J | Statistic R | Statistic C | |
| MS Visual C++ | 0.0148 | 0.2700 | 0.0900 | 0.6300 | 0.4200 | 0.8000 |
| GNU C | 0.4731 | 0.1600 | 0.9800 | 0.1100 | 0.1900 | 0.4900 |
| Minstd 48271 | 0.0115 | 0.0090 | 0.1400 | 0.4900 | 0.0700 | 0.0044 |
| MT19937-64 | 0.2548 | 0.0800 | 0.1000 | 0.4200 | 0.9700 | 0.3500 |
| 0.0000 | 0.0000 | 0.2200 | 0.0000 | 0.3900 | 0.3800 |
| tno | test name | parameters | p-value |
| 74 | RandomWalk1 R | ||
| 80 | LinearComp | ||
| 81 | LinearComp | ||
| Mersenne Twister | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On testing pseudorandom generators via statistical tests based on the arcsine law111Work supported by NCN Research Grant DEC-2013/10/E/ST1/00359
Paweł Lorek
Grzegorz Łoś
Karol Gotfryd
Filip Zagórski
Mathematical Institute, University of Wrocław, pl. Grunwaldzki 2/4, 50-384, Wrocław, Poland
Institute of Computer Science, University of Wrocław, Joliot-Curie 15, 50-383, Wrocław, Poland
Department of Computer Science, Faculty of Fundamental Problems of Technology, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland
Abstract
Testing the quality of pseudorandom number generators is an important issue. Security requirements become more and more demanding, weaknesses in this matter are simply not acceptable. There is a need for an in-depth analysis of statistical tests – one has to be sure that rejecting/accepting a generator as good is not a result of errors in computations or approximations. In this paper we propose a second level statistical test based on the arcsine law for random walks. We provide a Berry-Essen type inequality for approximating the arcsine distribution, what allows us to perform a detailed error analysis of the proposed test.
keywords:
The arcsine law , Random walks , Pseudorandom number generator , Statistical testing , Second level testing , Berry-Esseen type inequality , Randomness , Dyck paths
††journal: Journal of Computational and Applied Mathematics
1 Introduction
Random numbers are key ingredients in various applications, e.g., in cryptography (e.g., for generating cryptographic keys) or in simulations (e.g., in Monte Carlo methods), just to mention a few. No algorithm can produce truly random numbers. Instead, pseudorandom number generators (PRNGs) are used. These are deterministic algorithms producing numbers which we expect to resemble truly random ones in some sense. There are two classes of tests used to evaluate PRNGs, theoretical and statistical ones. Theoretical tests examine the intrinsic structure of a given generator, the sequence does not necessarily need to be generated. Two classical examples are the lattice test [1] and the spectral test described in [2] (Section 3.3.4). See also [3] for a description of some standard tests from this class. This category of tests is very specific to each family of generators e.g., some are designed only for linear congruential generators. On the other hand, the second class of tests – empirical tests – are conducted on a sequence generated by a PRNG and require no knowledge of how it was produced. The main goal of these tests is to check if the sequence of numbers (or bits, depending on the actual implementation) produced by a PRNG has properties similar to those of a sequence generated truly at random. These tests try to find statistical evidence against the null hypothesis stating that the sequence is a sample from independent random variables with uniform distribution. Any function of a finite number of uniformly distributed random variables, whose (sometimes approximate) distribution under hypothesis is known, can be used as a statistical test. Due to the popularity and significance of the problem, a variety of testing procedures have been developed in recent years. Such statistical tests aim at detecting various deviations in generated sequences, what allows for revealing flawed PRNGs producing predictable output. Some of the procedures encompass classical tools from statistics like the Kolmogorov-Smirnov test or the Pearson’s chi-squared test, which are used for comparing the theoretical and empirical distributions of appropriate statistics calculated for a PRNG’s output. It is also possible to adapt tests of normality like the Anderson-Darling or Shapiro-Wilk tests for appropriately transformed pseudorandom sequences. These methods exploit the properties of sequences of i.i.d. random variables. Based on the original sequence returned by the examined PRNG we are able to obtain realizations of random variables with known theoretical distributions. Some examples of probabilistic laws used in practice in this kind of tests can be found e.g., in [2]. They include such procedures like the gap test, the permutation test and the coupon collector’s test, just to name a few (see [2] for a more detailed treatment). These methods have also the advantage that they implicitly test the independence of the generator’s output. The main issue with such methods is that a single statistical test looks only at some specific property that holds for sequences of truly random numbers. Hence, for practical purposes bundles of diverse tests are created. Such a test bundle consists of a series of individual procedures based on various stochastic laws from probability theory. A PRNG is then considered as good if the pseudorandom sequences it produces pass all tests in a given bundle. Note that formally it proves nothing, but it increases the confidence in the simulation results. Thus, they are actually tests for non-randomness, as pointed out in [4]. Some examples of such test suites are Marsaglia’s Diehard Battery of Tests of Randomness from 1995, Dieharder developed by Brown et al. (see [5]), TestU01 implemented by L’Ecuyer and Simard (see [6, 7]) and NIST Test Suite [8]. The last one, designed by the National Institute of Standard and Technology, is currently considered as one of the state of the art test bundles. It is often used for the preparation of many formal certifications or approvals.
A result of a single statistical test is typically given in the form of a -value, which, informally speaking, represents the probability that a perfect PRNG would produce “less random” sequence than the sequence being tested w.r.t. the used statistic. We then reject if , where is the significance level (usually ) and accept if . Such an approach is usually called one level or first level test. Although the interpretation of a single -value has a clear statistical explanation, it is not quite obvious how to interpret the results of a test bundle, i.e., of multiple tests. Under the distribution of -values is uniform. However, in a test bundle several different tests are applied to the same output of a PRNG, hence the results are usually correlated. The documentation of the NIST Test Suite includes some clues on how to interpret the results of their bundle (Section 4.2 in [9]), but in the introduction it is frankly stated: “It is up to the tester to determine the correct interpretation of the test results”.
To disclose flaws of PRNGs, a very long sequence is often required. In such situations, the applicability of a statistical test can be limited (depending on the test statistic) by the memory size of the computer. An alternative approach is to use a so-called two level (a term used e.g., in [3]) or second level (a term used e.g., in [4, 10]) test. In this approach we take into account several results from the same test over disjoint sequences generated by a PRNG. We obtain several -values which are uniformly distributed under , what is tested by e.g., some goodness-of-fit test (with potentially different level of significance – NIST suggests to use – obtaining new “final” -value). The authors in [11] observed that this method may be comparable to a first level test in terms of the power of a test (informally speaking, it represents the probability of observing “less random” sequence than the sequence being tested under an alternative hypothesis , see [11] for details), but often it produces much more accurate results, as shown in [4]. Roughly speaking, the accuracy is related to the ability, given a non-random PRNG, of recognizing its sequences as non-random (for details see [4]). We will follow this approach.
In the second level approach one has to take under consideration the approximation errors in the computation of a -value. For example, in a first level test one usually calculates a -value of a statistic which – under – is approximately normally distributed. The approximation comes then from the central limit theorem, which lets us substitute the distribution of a given sum with the standard normal distribution. These errors in calculations of individual -values may accumulate, resulting in an error of a -value in a second level test, thus making the test not reliable. Following [4] we say that the second level test is not reliable when, due to errors or approximations in the computation of -values (in the first level), the distribution of -values is not uniform under . Fortunately, this approximation error can be bounded using the Berry-Essen inequality and the final error of a second level test can be controlled (see [4, 10] for a detailed example based on the binary matrix rank test). The influence of approximations on the computation of -values in a second level test was also considered in [12, 13]. In this article we present a statistical test based on the arcsine law, in which at some point we approximate a distribution of some random variable with the arcsine distribution. We provide a Berry-Essen type inequality which upper bounds the approximation error, what allows us to control the reliability of our second level test.
An interesting approach for testing PRNGs was presented by Kim et al. in [14]. The concept of their tests is based on the properties of a random walk (the gambler’s ruin algorithm) on the cyclic group with 0 being an absorbing state – more precisely, on the time till absorption. The authors in [14] propose three different variants of the test. The general idea of the basic procedure is the following. For some fixed and , the output of a PRNG is treated as numbers from the unit interval and used to define a random walk starting in such that if and the process is in state , then it moves to , otherwise it moves to . The aim of this test is to compare the theoretical and the empirical distributions of the time to absorption in 0 when starting at . Based on the values of testing statistic, the PRNG is then either accepted or rejected. The authors reported some “hidden defects” in the widely used Mersenne Twister generator. However, one has to be very careful when dealing with randomness. It seems like re-seeding a PRNG with a fixed seed is an error which can lead to wrong conclusions. The criticism was raised by Ekkehard and Grønvik in [15], where the authors also showed that the properly performed tests of Kim et al. [14] do not reveal any defects in the Mersenne Twister PRNG. Recently, the authors in [16] have proposed another gambler’s ruin based procedure for testing PRNGs. In their method they exploited formulas for winning probabilities for arbitrary sequences and , (i.e., the winning and losing probabilities depend on the current fortune) which are the parameters of the algorithm.
In recent years a novel kind of testing techniques has been introduced for more careful verification of generators. The core idea of this class of methods is based on an observation that the binary sequence produced by a PRNG, after being properly rescaled, can be interpreted as an one-dimensional random walk with , where . For random walks defined by truly random binary sequences a wide range of statistics have been considered over the years and a variety of corresponding stochastic laws have been derived (see e.g., [17]). For a good PRNG we may expect that its output will behave like . Hence, the following idea comes to mind: choose some probabilistic law that holds for truly random bit sequences and compare the theoretical distribution of the corresponding statistic with the empirical distribution calculated for sequences produced by a given PRNG in independent experiments. This comparison can be done e.g., by computing the -value of an appropriate test statistic under the null hypothesis that the sequence generated by this PRNG is truly random.
Another concept named statistical distance based testing was suggested in [18]. It relies on calculation of some statistical distances like e.g., total variation distance between the theoretical and empirical distributions for considered characteristics and rejecting a PRNG if the distance exceeds some threshold. We will also follow this approach, indicating the corresponding threshold. In [18] the authors derive their test statistics from the law of iterated logarithm for random walks (the procedure is called the LIL test). The proposed by us procedure uses similar methodology and is based on the arcsine law. We made the code publicly available, see [19]. It includes the arcsine law based as well as the law of iterated logarithm based statistical tests, the implementation of many PRNGs (more than described in this article) including the Flawed generator (see Section 4) and the seeds we used.
Organization of the paper
In the following Section 2 we define a general notion of a PRNG and recall the aforementioned stochastic laws for random walks. The testing method along with the error analysis is described in Section 3. The concise report on experimental results (including the Flawed generator introduced in Section 4) is given in Section 5. In Section 6 we mention other implementations of the tests based on the arcsine law. We conclude in Section 7.
2 Pseudorandom generators and stochastic laws for random walks
2.1 Pseudorandom generators
The intuition behind pseudorandom number generator is clear. However, let us give a strict definition roughly following Asmussen and Glynn [20].
Definition 2.1**.**
A Pseudorandom number generator (PRNG) is a 5-tuple , where is a finite state space, is a set of values, is a so-called seed, i.e., an initial state in the sequence , a function describes the transition between consecutive states and maps the generator’s state into the output.
Usually or for some , the latter one is used throughout the paper. Recall that LCG (linear congruential generator) is a generator which updates its state according to the formula . Thus, it is defined by three integers: a modulus , a multiplier , and an additive constant . In the case , the generator is called MCG (multiplicative congruential generator). For a detailed description of some commonly used PRNGs see the surveys [21, 22, 23] or the book [24].
It is clear that both the input and the output of a random number generator can be viewed as a finite sequence of bits. For a PRNG to be considered as good, the output sequences should have some particular property, namely each returned bit has to be generated independently with equal probability of being 0 and 1. We say that the sequence of bits is truly random if it is a realization of a Bernoulli process with success probability .
Given a PRNG returning integers from the set , we may obtain a pseudorandom binary sequence with any given length using the following simple procedure. Namely, as long as the bit sequence is not sufficiently long, generate the next pseudorandom number and append its binary representation (on bits) to the current content of . In the ideal model with being truly random number generator, such algorithm produces truly random bit sequences provided that is a power of 2. Indeed, for there is one to one correspondence between -bit sequences and the set . Hence, if each number is generated independently with uniform distribution on , then each combination of bits is equally likely and therefore each bit of the output sequence is independent and equal to 0 or 1 with probability .
However, this is not true for . It is easy to observe that in such a case the generator is more likely to output 0s and the generated bits are no longer independent. Thus, rather than simply outputting the bits of , one may instead take first bits from the binary representation of for some fixed . Such a method has the advantage that it can be easily adopted for an underlying generator returning numbers from the unit interval, what is common for many PRNG implementations.
2.2 Stochastic laws for random walks
Let be a Bernoulli process with a parameter , i.e., a sequence of independent random variables with identical distribution . A good PRNG should behave like a generator of Bernoulli process with (what we assume from now on). It will be, however, more convenient to consider the following transformed process
[TABLE]
The sequence is -valued, the process is called a random walk.
The law of iterated logarithm
Of course . However, large values of occur with small probability and the values of are in practice in a much narrower range than . The weak and the strong law of large numbers imply that where denotes the convergence in probability and denotes the almost sure convergence. Thus, the deviations of from 0 grow much slower than linearly. On the other hand the central limit theorem states that (where denotes the convergence in distribution), what is in some sense a lower bound on fluctuations of – they will leave the interval since we have (implied by 0-1 Kolmogorov’s Law, see e.g., Theorem 5.1 in [25]). It turns out that the fluctuations can be estimated more exactly.
Theorem 2.2** (The law of iterated logarithm, [26], cf. also Chapter VIII.5 in [17]).**
For a random walk we have
[TABLE]
Thus, to normalize dividing by is too strong and dividing by is too weak. The fluctuations of from 0 grow proportionally to .
To depict the law of iterated logarithm, we took output sequences from the Mersenne Twister MT19937 generator, each initialized with a random seed taken from http://www.random.org, where each output was of length . In Figure 1 we presented these 500 trajectories , where . Each trajectory is depicted by a single polyline. The darker the image the higher the density of trajectories. We can see that roughly corresponds to the fluctuations of . However, few trajectories after around billion steps are still outside . The law of iterated logarithm tells us that for appropriately large the trajectories will not leave with probability , what is not the case in Figure 1. It means that must be much larger than .
One could think that the following is a good test for randomness: fix some number, say , and classify the considered PRNG as good if the difference between the number of ones and zeros never exceeds . The large difference may suggest that zeros and ones have different probabilities of occurrence. However, the law of iterated logarithm tells us that this reasoning is wrong. Indeed, we should expect some fluctuations and the absence of them means that a PRNG does not produce bits which can be considered random. This property of random walks was used by the authors in [18] for designing a novel method of testing random number generators.
There is yet another interesting property. Define . The law of iterated logarithm implies that does not converge pointwise to any constant. However, it converges to 0 in probability. Let us fix some small . For almost all , with an arbitrary high probability the process will not leave . On the other hand, this tells us that the process will be outside this interval infinitely many times. This apparent contradiction shows how can our intuition be unreliable on phenomena taking place at infinity.
The arcsine law
The observations described previously imply that averaging every , it will spend half of its time above the -axis and half of its time below. However, the typical situation is counter-intuitive (at first glance): typically the random walk will either spend most of its time above or most of its time below the -axis. This is expressed in the Theorem 2.3 below (for reference see e.g., [17]). Before we formulate the theorem, let us first introduce some notations. For a sequence , as defined in (1), let
[TABLE]
where is the indicator function. is equal to 1 if the number of ones exceeds the number of zeros either at step or at step , and 0 otherwise (in a case of ties, i.e., , we look at the previous step letting ). In other words, corresponds to the situation in which the line segment of the trajectory of the random walk between steps and is above the -axis.
Theorem 2.3** (The arcsine law).**
Let be a Bernoulli process. Define and ( is given in (2)). For we have
[TABLE]
The probability is the chance that the random walk was above the -axis for at most fraction of the time. The limiting distribution is called the arcsine distribution. Its density function is given by and the cumulative distribution function (cdf) is . The shape of the pdf clearly indicates that the fractions of time spent above and below [math]-axis are more likely to be unequal than close to each other.
3 Testing PRNGs based on the arcsine law
In this Section we will show how to exploit the theoretical properties of random walks from the preceding discussion to design a practical routine for testing PRNGs. We describe our approach based on the arcsine law which we employ for experimental evaluation of several commonly used generators (the results are presented in Section 5). We also perform an error analysis of the proposed testing procedure, providing corresponding bounds on the approximation errors. Finally, we make some remarks on the reliability of our second level test.
3.1 The arcsine law based testing
The general idea of tests is the following. Take a sequence of bits generated by PRNG, rescale them as in (1) and compare the empirical distribution of
[TABLE]
(a fraction of time instants at which ones prevail zeros) with its theoretical distribution assuming that truly random numbers were generated. In terms of hypothesis testing: given the null hypothesis that the bits in the sequence were generated independently and uniformly at random (vs. : that the sequence was not randomly generated), the distribution of follows the arcsine law (Theorem 2.3), i.e., we can conclude that for large we have
[TABLE]
(we will be more specific on “” in Section 3.2). We follow the second level testing approach (cf. [4, 10]), i.e., we take into account several results from the same test over different sequences. To test a PRNG we generate sequences of length each, thus obtaining realizations of the variable . Denoting by the value of -th simulation’s result (we call them a basic tests), we then calculate the corresponding -values
[TABLE]
Under the distribution of , should be uniform on . We fix some partition of and count the number of -values within each interval. In our tests we will use an -element partition , where
[TABLE]
Now we define the measures (the uniform measure on ), (the empirical measure on ), (the expected number of -values within ) and (the number of observed -values within ). For let
[TABLE]
We perform the Pearson’s goodness-of-fit test, which uses the following test statistic
[TABLE]
Under the null hypothesis, has approximately distribution. We calculate the corresponding -value
[TABLE]
where has a distribution. Large values of – and thus small values of – let us suspect that a given PRNG is not good. Typically, we reject (i.e., we consider the test failed) if where is a predefined level of significance (for a second level test we use , as suggested by NIST). Note that the probability of rejecting when the sequence is generated by a perfect random generator (so-called Type I error) is exactly .
Another approach relies on the statistical distance based testing, which is the technique presented in [18]. We consider the statistic
[TABLE]
i.e., a total variation distance between the theoretical distribution and the empirical distribution . Similarly, large values of indicate that a given PRNG is not good. Concerning Type I error we will make use of the following lemma (see Lemma 3 in [27] or its reformulation, Lemma 1 in [28]).
Lemma 3.4**.**
Assume and consider the partition . Then, for all we have
[TABLE]
To summarize, for a given PRNG we generate sequences of length each. and we choose (and thus the partition ). We then calculate and together with its -value. We specify the thresholds for -value and indicating whether the test failed or not (the details are presented in Section 5). We denote the described procedure as the ASIN test.
Remark. Note that the described procedure for calculating and is equivalent to the following one. Instead of calculating -values of , we could directly count the number of falling into each interval and compare the empirical distribution with the theoretical one. To be more precise, for let
[TABLE]
where for . Then statistics and can be rewritten as
[TABLE]
This technique was presented in [18] (for the total variation and few other distances) and this is how our implementation of the ASIN test [19] calculates the statistics.
We could also calculate just one -value of the statistic for a longer sequence (say, for ) – i.e., perform a first level test. However, as mentioned in Section 1, the second level approach produces more accurate results (roughly speaking, the accuracy is related to the ability, given a non-random PRNG, of recognizing its sequences as non-random, see details in [4]).
It is worth noting that the following approach can be applied when or are slightly outside the acceptance region (e.g., if ), what suggests rejecting , but is not a strong evidence). Namely, double the length of the sequence, take a new output from the PRNG and apply the test again. Repeat the procedure (at most some predefined number of times) until the evidence is strong enough (e.g., ) or is accepted (e.g., ). This method, called “automation of statistical tests on randomness”, was proposed and analyzed in [29].
3.2 Error analysis
3.2.1 Bounding errors in approximating -values in basic tests
In this subsection we will show a bound on the approximation error in (3). Recall that .
Lemma 3.5**.**
Fix a partition and an even . Let be the cdf of the empirical distribution of under (stating that the bits were generated uniformly at random), i.e., . Then we have
[TABLE]
Proof. We will show that for fixed and such that we have
[TABLE]
Let us assume that . Let denote the probability that during steps in the first steps the random walk was above 0-axis, i.e., . The classical results on a simple random walk state that
[TABLE]
The standard proof of Theorem 2.3 (see, e.g., Chapter XII.8 in [30]) shows that converges to . In the following, we will bound the difference . We will use a version of Stirling’s formula stating that for each there exists , , such that
[TABLE]
Plugging (5) into each factorial appearing in (4) we have
[TABLE]
Thus, we get
[TABLE]
and
[TABLE]
For any it holds that and for we have that . Note that , what is equivalent to , what holds for any . Hence,
[TABLE]
what implies
[TABLE]
Fix and assume furthermore that . The function achieves the minimum value at the endpoints of the considered interval, thus
[TABLE]
We will estimate the approximation error in (3) in two steps. First, take two numbers such that . We have
[TABLE]
The second kind of errors in probability estimates given by (3) is caused by approximating the sum by an integral. Let us consider an arbitrary function differentiable in the interval . Split into subintervals of length and let be an arbitrary point in the interval containing . Denote by and the maximum and the minimum value of on that interval, respectively. Using the Lagrange’s mean value theorem we obtain
[TABLE]
For we have and Hence, in the considered interval we have
[TABLE]
We also have
[TABLE]
Taking we obtain
[TABLE]
what justifies the approximation (3) for . To complete the analysis we need to investigate the errors “on the boundaries” of a unit interval, i.e., for (and, by symmetry, for ). We get
[TABLE]
where the last inequality follows directly from the preceding calculations. ∎
Remark. Let be zero-average i.i.d. random variables with . Denote . The central limit theorem states that , a normal random variable (denote its cdf by ), is the limiting distribution of (denote its cdf by by ). It means that for large we can approximate by and the approximation error is bounded by the Berry-Essen inequality
[TABLE]
where is a positive constant (in original paper [31] it was shown that , in [32] it was shown that ). Lemma 3.5 is thus a Berry-Essen type inequality for approximating by a random variable with cdf , tailored to our needs.
3.2.2 Reliability of the results from the second level test
Following [10], we say that a basic test (calculating ) is not reliable if, due to approximation errors in the computations of -values, the distribution of for truly random numbers is not uniform. We test the uniformity via and . Since we compare two continuous distributions, some discretization needs to be applied. In our testing procedure we use a partition for this, splitting the interval into intervals (i.e., the bins). Lemma 3.5 states that a maximum error in the computation of is bounded by (note that implicitly depends on ). It means that a -value that should belong to a given bin can be found in the neighboring ones only if the distance between and one of the endpoints of a given bin is less than . Thus, this is also the fraction of -values that can be found in wrong bins. The maximum propagated deviation is twice the error (since most bins have two neighbors), i.e.,
[TABLE]
Under the distribution of the numbers in the bins is a multinomial distribution. Indeed, this is equivalent to throwing balls independently into bins, where the probability of choosing first and last bin is and for all remaining bins. The variance of the ratio of number of balls in bin is equal to , and for bin is equal to . We have , where is the expected statistical deviation of the ratio of -values found in a given bin. We expect that the error in approximating -values propagates into an additional deviation. If the deviation is smaller than the statistical deviation, i.e., if
[TABLE]
then we say that the second level test is reliable. Note that the reliability of a test imposes a restriction on a relation between the length of a sequence used for each base test (i.e., ) and the number of basic tests (). Inequality (6) implies a lower bound on , namely
[TABLE]
4 The Flawed PRNG
In this section we present – a family of PRNGs. The family depends on three parameters: (a PRNG, e.g., the Mersenne Twister), (a small integer parameter, e.g., ) and . generates the same output as for a fraction of all possible seeds. For the remaining fraction of seeds it outputs bits such that the corresponding walk of the length spends exactly half of the time ( steps) above zero and exactly half of the time below zero. In the following we will denote .
4.1 Dyck Paths
To generate walks with the aforementioned property we will use Dyck paths, i.e., walks starting and ending at 0 with the property that for each prefix the number of ones is not smaller than the number of zeros.
Definition 4.6**.**
A sequence of bits is called a Dyck path if the corresponding walk fulfills and . A set of all Dyck paths of length is denoted by .
Thus, a Dyck path of length corresponds to a valid grouping of pairs of parentheses. We have ( is the -th Catalan number).
4.2 Sampling Dyck Paths
We are interested in generating Dyck paths uniformly at random. To achieve this goal we will use the following three ingredients.
(1) Walk sampling
Let be the set of sequences of bits such that the corresponding walk ends at , i.e., . One can easily sample a sequence uniformly at random – it is enough to make a random permutation of the vector of bits , consisting of zeros and ones.
(2) transformation
One can obtain a Dyck path of length from using Algorithm 1.
Observe that transforms into a Dyck path. This follows from simple observations:
has exactly zeros and ones; 2. 2.
since then and after is removed then has exactly bits equal to [math] and bits equal to ; 3. 3.
from the definition of (which enforces in particular that ), the walk that corresponds to bits cannot go below [math].
An example of a transformation is presented in the Figure 2.
(3) The Cycle Lemma
The correspondence between the set and the set is expressed by the Cycle Lemma (see, e.g., [33]).
Lemma 4.7** (The Cycle Lemma).**
For any the path is a Dyck path. Moreover, any Dyck path in is the image of exactly paths in .
Thus, to obtain a random sampling of a Dyck path of length one needs to run Algorithm 2
4.3 The Flawed generator
As mentioned at the beginning of this section, the generator – described as Algorithm 3 – works exactly the same as the underlying generator for a fraction of seeds (lines 1-2). For the remaining fraction of of seeds (lines (3-8), the “else” branch) the output is generated in the following way.
The first bits are exactly the same as the first bits of (line 4). 2. 2.
The next bits are generated as follows:
- (a)
a pseudorandom permutation is generated (line 5), 2. (b)
the bit is set to be equal to (lines 6-8).
As the result, there is the same number of zeros and ones in the first bits – these bits are denoted as (i.e., the corresponding walk is at zero at step ). 3. 3.
The remaining bits (denoted as ) in the block are obtained by calling (Algorithm 4). As the result, the whole block of output bits (concatenated blocks of and ) has the property that the corresponding walk spends the same number of steps above and below 0 (the description of DyckPaths algorithm is below).
Example 4.8** (Flawed).**
Let and a is such that lines 4-10 are executed. Let the result of line 4 be: and line 5 returns a permutation:
[TABLE]
Then the bits computed in lines 6-8 are:
[TABLE]
Then bits are used as an input to function in line 10. The example is continued as Example 4.9.
Let us assume that Algorithm 3 (Flawed) has generated bits by executing lines . Then the corresponding random walk spent time under the -axis and time over the -axis (). The goal of the procedure is to generate bits in such a way that:
and
- 2.
.
Then if one concatenates sequences and , for the corresponding random walk it holds that .
The following is an informal explanation of the procedure (the formal description is provided by Algorithm 4).
The walk corresponding to is given by . 2. 2.
The sequence is defined as: , for . 3. 3.
The set of points where the walk changes its sign is defined as . 4. 4.
Elements of are sorted in increasing order (obtaining ). 5. 5.
The sequence is defined as: , and the next “left-ends” as (for ). 6. 6.
The set is defined as , for . 7. 7.
Bits () are chosen so that the whole walk spends the same number of steps over and under -axis. Dyck’s paths are generated222The definitions of and imply that is even, : , for , for some hash function . We use here a hash function to obtain differently sampled Dyck’s paths. This is achieved by re-seeding the generator to be dependent on:
- (a)
the current ,
- (b)
a single bit equal to which corresponds to the type of the sequence one wants to get (over or under the -axis),
- (c)
the path number. 8. 8.
A relative ordering of the paths (generated in the previous step) is obtained from a permutation . 9. 9.
The resulting bits are obtained by concatenating permuted Dyck’s paths.
Example 4.9** (DyckPaths).**
Let input to be . Then and and thus .
Let the output of sampleDyckPath (called in lines 7-9 of DyckPaths) are , , , .
Let
[TABLE]
Then .
Ten sample trajectories of the Flawed generator (all generated by the Dyck path-based part of Algorithm 3) are depicted in Figure 3 (the instance of was initialized with the following parameters: , rng – the Mersenne Twister).
5 Experimental results
In this section we briefly report our experimental results of testing some widely used PRNGs implemented in standard libraries in various programming languages. We have applied the ASIN test to different generators including the implementations of the standard C/C++ linear congruential generators, the standard generator rand from the GNU C Library, the Mersenne Twister, the Minstd and the CMRG generator. As our last example we show the results of testing the Flawed generator. Flawed is identified by our ASIN test as non-random, whereas it passed many other tests, including all closely related procedures (swalk_RandomWalk1 test from TestU01 with statistics: H, M, J, R, C, see Table 2).
Each considered PRNG was tested by generating sequences of length , using the partition , i.e., . For these parameters our second level test is reliable (see Section 3.2.2) – , the expected statistical deviation of the ratio of values found in a given bin is greater than , what significantly exceeds the maximum propagated error , i.e., (6) holds. Note that for the inequality (7) yields:
for ,
- 2.
for ,
- 3.
, for .
In the experiments we used our custom implementations of tested PRNGs (except the Mersenne Twister). We used 64-bit version of C++11 implementation of the Mersenne Twister, i.e., the class std::mt19937_64, which is, however, known to have some problems [34]. The generators were initialized with random seeds from http://www.random.org [35] and each sequence was generated using different seed.
The results are presented in Table 1. The values indicating that should be rejected (w.r.t. significance level , a value suggested by NIST for second level test) are bolded. For these are simply the values smaller or equal to . Concerning the values of , Lemma 3.4 implies that for we have It can be checked that for , in other words
[TABLE]
i.e., we reject if the value of is larger than . Note that for the results in Table 1 either both statistics and -value of reject or both accept it.
We have also calculated the swalk_RandomWalk1 statistics from TestU01 for 10000 sequences of length of each PRNG. The following parameters for swalk_RandomWalk1 were used: . The results are given in Table 2 (including the Flawed generator described in Section 4). For each Statistic H, M, J, R and C, the corresponding -values were obtained using the chi-square statistics. For convenience, -values of are also included in the Table 2 (in the column ).
Our ASIN test would reject the MS Visual C++ PRNG and the Minstd with a multiplier 48271 (The Minstd with a multiplier 16807 gave similar results - not reported here) as good PRNGs. Note that this is indicated by both and the value of . We also conducted the experiments for the procedure rand from the standard library in the Borland C/C++ (not included here). The outcomes are very akin to those for a standard PRNG in the MS Visual C++. Note that for none of the -values calculated by the swalk_RandomWalk1 from TestU01 suggests rejecting the hypothesis that the MS Visual C++ PRNG is good, whereas the statistics H and C indicate that there can be some flaws in the Minstd 48271. It is worth mentioning that the MS Visual C++ PRNG passes the NIST Test Suite [8], as pointed out in [18]. Minstd, despite its weaknesses, became a part the C++11 standard library. It is implemented by the classes std::minstd_rand0 (with the multiplier 16807) and std::minstd_rand (with the multiplier 48271). Concerning the GNU C and the MT19937-64 – as can be seen in both Table 1 and Table 2 – they can be both considered as good. It is worth mentioning that the results for the CMRG generator (not reported here) were similar to those for the MT19937-64.
The open source code of our implementation is publicly available, see [19] (it includes the Flawed PRNG as well as the Law of Iterated Logarithm test from [18]).
Discussion on the influence of the parameter of the Flawed PRNG on the statistic
Recall that in the Algorithm 3 the parameter corresponds to a fraction of simulations which are exactly half of the time above and half of the time below -axis, i.e., we have for simulations. Note that the -value is then also equal to . The remaining simulations come from the . Let us assume that the returns truly random numbers.
Concerning statistic, we have and . Set . For an “ideal“ we would have and . Thus,
[TABLE]
For the parameters we have and the corresponding -value is less than . It means that even for an producing truly random numbers, the ASIN test should recognize the Flawed generator as not good.
5.1 Results of TestU01 for Flawed
We have run several general-purpose tests against the Flawed generator. For SmallCrush all 15 out of 15 tests were passed. For the Mersenne Twister (MT) and the we run BigCrush. Tests for which generators failed are presented in the Table 3.
6 Notes on Takashima’s method for testing PRNGs and the arcsine test implementation from TestU01
The idea of using the arcsine law for developing statistical tests for an empirical evaluation of PRNGs was formerly proposed by Takashima in [36, 37, 38]. In this series of articles, test statistics based on the arcsine law were applied for assessing the randomness of the output of maximum-length linearly recurring sequences (-sequences in short). The experimental results presented there clearly show that the bits produced by this family of PRNGs are biased. Besides revealing the weakness of -sequences, these outcomes have also proved that Takashima’s tests are effective methods, worth applying in practice.
The approach introduced in [36, 37] can be briefly described as follows. After an initialization of a PRNG, a sequence of bits is generated and divided into subsequences of length . Then, each subsequence is used for constructing a random walk. For each of these sample random walks, the value of a test statistic based on the arcsine law is calculated. The investigated statistic, called in [36, 37] the sojourn time – denote them by – is the time spent by a random walk above the -axis. From realizations of this statistic an empirical distribution of the sojourn time , is then derived and compared with its theoretical distribution via a chi-square test. The whole procedure is repeated times, yielding a set of test statistics’ values . The final step of the Takashima’s testing method is to count the number of values falling between -th and -th percentile and those bigger than -th percentile of a respective distribution. These two counts are then the basis for deciding if should be rejected. Note that for this is a third level test, which in general is not reliable, as shown in [3].
The author in [37] considers also a slightly modified variant of the procedure, where the chi-square test is combined with the Kolomogorov-Smirnov test. Another method, presented in [38], exploits the relations between the sojourn time and the last visit time for one-dimensional random walks.
It is worth noting that in our simulations we used binary sequences of length at least . Thus, a direct application of the Takashima’s methods from [36, 37] would require large amount of additional memory to store the values of .
As the arcsine law based statistical tests were proven to be useful in detecting flaws of some PRNGs, such procedures were implemented in the TestU01 library (see [7]). This tool, developed by L’Ecuyer and Simard, provides a big variety of functions for empirical examining of PRNGs. One of the test modules, swalk, contains a procedure swalk_RandomWalk1, which calculates a bunch of test statistics for a sample of random walks constructed from chosen bits of generated binary sequences. Among them, there is the Statistic J, which implements the test based on the arcsine law. This procedure is similar to ours. Namely, calculated values of the test statistic are grouped according to some partition and their empirical distribution is compared with the theoretical one by means of the chi-square test. The main difference is that in our testing method the partition size is a parameter chosen by the user, whereas the partition used by swalk_RandomWalk1 is calculated automatically, depending on the tested sequence. Moreover, we provide bounds on approximation errors in the computations of -values (a Berry-Essen type inequality), assuring the reliability of the whole testing procedure.
7 Conclusions
In this paper we analyzed a method for testing PRNGs based on the arcsine law for random walks. Our procedure is a second level statistical test. We also provided a detailed error analysis of the proposed method. The approximation errors in the calculation of -values are bounded by a Berry-Essen type inequality, what allows to control the overall error, assuring the reliability of the test. We evaluate the quality of PRNGs via the chi-square statistics as well as by calculating a statistical distance (the total variation distance) between the empirical distribution of the considered characteristic for generated pseudorandom output and its theoretical distribution for truly random binary sequences.
The experimental results presented in this paper show that our testing procedure can be used for detecting weaknesses in many common PRNGs’ implementations. Likewise the Law of Iterated Logarithm test from [18], the ASIN test has also revealed some flaws and regularities in generated sequences not necessarily being identified by other current state of the art tools like the NIST SP800-22 Testing Suite or TestU01. Thus, these kind of testing techniques seem to be very promising, as they allow also for recognition of different kinds of deviations from those detected by existing tools. Nevertheless, like other statistical tests, the ASIN test is not universal and encompasses only one from an immense range of characteristics of random bit strings and does not capture all known flaws. Therefore, the testing procedures relying on properties of random walks like the ASIN test should be used along with other tests for more careful assessment of pseudorandom generators. This issue is well depicted by the provided example of obviously non-random generator Flawed for which the LIL test has failed to detect its weaknesses, but the ASIN test has turned out to be very sensitive for that kind of deviations. Hence, an important line of further research is to develop another novel tests utilizing various properties of random walks. Such tests, when combined together, should be capable of detecting more hidden dependencies between the consecutive bits in the sequences generated by PRNGs. This could lead to designing more robust test suites for evaluating the quality of random numbers generated by the new implementations of PRNGs as well as those being already in use, especially for cryptographic purposes.
Acknowledgements
We would like to thank the anonymous reviewers whose suggestions and insightful comments helped significantly improve and clarify this manuscript. In particular we thank one of the reviewers for pointing out the article [4] on second level tests.
References
- [1]
G. Marsaglia, The structure of linear congruential sequences, in: S. Zaremba (Ed.), Applications of Number Theory to Numerical Analysis, Academic Press, 1972, pp. 249–285.
- [2]
D. E. Knuth, The art of computer programming, Volume 2: Seminumerical Algorithms, 3rd Edition, Addison-Wesley Pub. Co, 1997.
- [3]
P. L’Ecuyer, Testing random number generators, in: Proceedings of the 24th Conference on Winter Simulation, WSC ’92, ACM, New York, NY, USA, 1992, pp. 305–313.
- [4]
F. Pareschi, R. Rovatti, G. Setti, Second-level nist randomness tests for improving test reliability., in: ISCAS, IEEE, 2007, pp. 1437–1440.
- [5]
R. G. Brown, D. Eddelbuettel, D. Bauer, Dieharder: A Random Number Test Suite, http://www.phy.duke.edu/~rgb/General/dieharder.php.
- [6]
P. L’Ecuyer, R. Simard, TestU01: A C library for empirical testing of random number generators, ACM Transactions on Mathematical Software 33 (4) (2007) 22.
- [7]
P. L’Ecuyer, R. Simard, TestU01: A Software Library in ANSI C for Empirical Testing of Random Number Generators. Software user’s guide, version of May 16, http://simul.iro.umontreal.ca/testu01/tu01.html/ (2013).
- [8]
NIST.gov - Computer Security Division - Computer Security Resource Center, NIST Test Suite, csrc.nist.gov/groups/ST/toolkit/rng/index.html (2010).
- [9]
A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson, M. Vangel, D. Banks, A. Heckert, J. Dray, S. Vo, A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, Tech. Rep. Rev. 1a, NIST (2010).
- [10]
F. Pareschi, R. Rovatti, G. Setti, Second-level testing revisited and applications to NIST SP800-22, in: 2007 18th European Conference on Circuit Theory and Design, IEEE, 2007, pp. 627–630.
- [11]
P. L’Ecuyer, R. Simard, S. Wegenkittl, Sparse serial tests of uniformity for random number generators, SIAM J. Sci. Comput. 24 (2) (2002) 652–668.
- [12]
M. Matsumoto, T. Nishimura, A Nonempirical Test on the Weight of Pseudorandom Number Generators, in: Monte Carlo and Quasi-Monte Carlo Methods 2000, Springer Berlin Heidelberg, Berlin, Heidelberg, 2002, pp. 381–395.
- [13]
P. C. Leopardi, Testing the tests: Using random number generators to improve empirical tests, in: L’ Ecuyer P., Owen A. (eds) Monte Carlo and Quasi-Monte Carlo Methods, 2009, pp. 501–512.
- [14]
C. Kim, G. H. Choe, D. H. Kim, Tests of randomness by the gambler’s ruin algorithm, Applied Mathematics and Computation 199 (1) (2008) 195–210.
- [15]
H. Ekkehard, A. Grønvik, Re-seeding invalidates tests of random number generators, Applied Mathematics and Computation 217 (1) (2010) 339–346.
- [16]
P. Lorek, M. Słowik, F. Zagórski, Statistical testing of PRNG: Generalized gambler’s ruin problem, in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 10693 LNCS, 2017, pp. 425–437.
- [17]
W. Feller, An introduction to probability theory and its applications, Volume 1, 3rd Edition, John Wiley & Sons, 1968.
- [18]
Y. Wang, T. Nicol, On statistical distance based testing of pseudo random sequences and experiments with PHP and Debian OpenSSL, Computers & Security 53 (2015) 44–64.
- [19]
P. Lorek, G. Łoś, F. Zagórski, K. Gotfryd, PRNG_Arcsine_tester: The arcsine law based statistical testing of PRNGs. GitHub repository, https://github.com/lorek/PRNG_Arcsine_test (2018).
- [20]
S. Asmussen, P. Glynn, Stochastic Simulation: Algorithms and Analysis, Springer, 2007.
- [21]
D. P. Kroese, T. Taimre, Z. I. Botev, Handbook of Monte Carlo Methods, John Wiley & Sons, Inc., Hoboken, NJ, USA, 2011.
- [22]
P. L’Ecuyer, History of uniform random number generation, in: 2017 Winter Simulation Conference (WSC), IEEE, 2017, pp. 202–230.
- [23]
H. Niederreiter, Quasi-Monte Carlo methods and pseudo-random numbers6, Bulletin of the American Mathematical Society 84 (6) (1978) 957–1041.
- [24]
M. Denker, W. A. Woyczynski, Introductory statistics and random phenomena : uncertainty, complexity, and chaotic behavior in engineering and science, Birkhäuser Boston, 1998.
- [25]
A. Gut, Probability : a graduate course, Springer, 2005.
- [26]
A. Khintchine, Über einen Satz der Wahrscheinlichkeitsrechnung, Fundamenta Mathematicae 6 (1) (1924) 9–20.
- [27]
L. Devroye, The equivalence of weak, strong and complete convergence in for kernel density estimates, Ann. Statist. 11 (3) (1983) 896–904.
- [28]
D. Berend, A. Kontorovich, On the convergence of the empirical distribution, https://arxiv.org/abs/1205.6711v2 (2012).
- [29]
H. Haramoto, Automation of Statistical Tests on Randomness, in: Monte Carlo and Quasi-Monte Carlo Methods 2008, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 411–421.
- [30]
W. Feller, An Introduction to Probability Theory and Its Applications, Volume 2, 2nd Edition, John Wiley & Sons, 1971.
- [31]
C. Esseen, On the liapunoff limit of error in the theory of probability, Ark. Mat. Astr. Fysik 28A (2) (1942) 1–19.
- [32]
I. S. Tyurin, Refinement of the upper bounds of the constants in Lyapunov’s theorem, Russian Mathematical Surveys 65 (3) (2010) 586–588.
URL http://stacks.iop.org/0036-0279/65/i=3/a=L09
- [33]
A. Dvoretzky, T. Motzkin, A problem of arrangements, Duke Mathematical Journal 14 (2) (1947) 305–313.
- [34]
S. Harase, Conversion of Mersenne Twister to double-precision floating-point numbers, https://arxiv.org/abs/1708.06018 (2017).
- [35]
M. Haahr, RANDOM.ORG: true random number service, https://www.random.org (Accessed: 2018-07-01).
- [36]
K. Takashima, Sojourn time test for maximum-length linearly recurring sequences with characteristic primitive trinomials 7 (1994) 77–87.
- [37]
K. Takashima, Sojourn time test of m-sequences with characteristic pentanomials, Journal of the Japanese Society of Computational Statistics 8 (1995) 37–46.
- [38]
K. Takashima, Last visit time tests for pseudorandom numbers, Journal of the Japanese Society of Computational Statistics 9 (1) (1996) 1–14.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. Marsaglia, The structure of linear congruential sequences, in: S. Zaremba (Ed.), Applications of Number Theory to Numerical Analysis, Academic Press, 1972, pp. 249–285.
- 2[2] D. E. Knuth, The art of computer programming, Volume 2: Seminumerical Algorithms, 3rd Edition, Addison-Wesley Pub. Co, 1997.
- 3[3] P. L’Ecuyer, Testing random number generators, in: Proceedings of the 24th Conference on Winter Simulation, WSC ’92, ACM, New York, NY, USA, 1992, pp. 305–313.
- 4[4] F. Pareschi, R. Rovatti, G. Setti, Second-level nist randomness tests for improving test reliability., in: ISCAS, IEEE, 2007, pp. 1437–1440.
- 5[5] R. G. Brown, D. Eddelbuettel, D. Bauer, Dieharder: A Random Number Test Suite, http://www.phy.duke.edu/~rgb/General/dieharder.php .
- 6[6] P. L’Ecuyer, R. Simard, Test U 01: A C library for empirical testing of random number generators, ACM Transactions on Mathematical Software 33 (4) (2007) 22.
- 7[7] P. L’Ecuyer, R. Simard, Test U 01: A Software Library in ANSI C for Empirical Testing of Random Number Generators. Software user’s guide, version of May 16, http://simul.iro.umontreal.ca/testu 01/tu 01.html/ (2013).
- 8[8] NIST.gov - Computer Security Division - Computer Security Resource Center, NIST Test Suite, csrc.nist.gov/groups/ST/toolkit/rng/index.html (2010).
