A revision of the subtract-with-borrow random number generators
Alexei Sibidanov

TL;DR
This paper reimplements the popular subtract-with-borrow random number generator RANLUX as a large-integer linear congruential generator, achieving faster performance and improved seeding capabilities on modern computers.
Contribution
It introduces a novel, efficient modular multiplication method for RANLUX, enabling faster random number generation and unique sequence seeding.
Findings
Significant speed improvement over traditional RANLUX
Comparable performance to other high-quality generators
Fast state skipping for unique sequence seeding
Abstract
The most popular and widely used subtract-with-borrow generator, also known as RANLUX, is reimplemented as a linear congruential generator using large integer arithmetic with the modulus size of 576 bits. Modern computers, as well as the specific structure of the modulus inferred from RANLUX, allow for the development of a fast modular multiplication -- the core of the procedure. This was previously believed to be slow and have too high cost in terms of computing resources. Our tests show a significant gain in generation speed which is comparable with other fast, high quality random number generators. An additional feature is the fast skipping of generator states leading to a seeding scheme which guarantees the uniqueness of random number sequences.
| luxury level | ||
| 0 | 24 | \seqsplitfffffffffffffffffffffffefffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe000000000000000000000001000000000000000000000000000000000000 |
| 1 | 48 | \seqsplit000000000000000000000002ffffffffffffffffffffffff000000000000000000000000000000000001fffffffffffffffffffffffc000000000000000000000001000000000001 |
| 2 | 97 | \seqsplitffffff000000000008000000000009fffffffffffefffffffffff1000000000000000000000006ffffff000004fffffffffff6ffffffffffec000000000001000000000015000001 |
| 3 | 223 | \seqsplit00028b000000000bba00000000026cfffffffff8e4fffffffff96000000000027b0000000007d0fffffffffe25ffffffffeef0fffffffffa0a000000000942000000000ba6000000 |
| 4 | 389 | \seqsplit0df0600000002ee0020000000b9242ffffffdf6604ffffffe4ab160000000d92ab0000001e93f2fffffff593cfffffffb9c8a6ffffffe525740000002c38960000002ecac9000000 |
| 1024 | \seqsplite1754cefa19deea6f58651c8ac11b437ba841c49eca3003ff0ef508f058cfdab6105ca16980e6a3ab12a823219e1cd0007281433953609f1cc9c5ca19cf7f0c6d3899b14b7c5ee90 | |
| 2048 | \seqsplitb48c187cf5b22097492edfcc0cc8e753ff74e54107684ed2256c3d3c662ea36c20b2ca60cb78c5096d8a15a13bee7cb0e64dcb31c48228ec4cec2c78af55c101ed7faa90747aaad9 |
| Core2 | Haswell | Skylake | |
|---|---|---|---|
| mul_basecase_core2 | 370.9 | 218.5 | 197.1 |
| mul_basecase_coreihwl | n.a. | 205.6 | 203.7 |
| mul_basecase_coreibwl | n.a. | n.a. | 163.0 |
| mul9x9 | 534.7 | 198.8 | 192.0 |
| mul9x9mulx | n.a. | 162.1 | 154.5 |
| mul9x9mulxadox | n.a. | n.a. | 119.4 |
| Haswell | Skylake | |
|---|---|---|
| mulmod9x9mulx | 201.2 | 191.6 |
| mulmod9x9mulxadox | n.a. | 155.4 |
| double | float | |
| dummy | 9.1 | 9.1 |
| std::minstd_rand | 35.1 | 20.2 |
| std::mt19937_64 | 36.0 | 37.0 |
| std::ranlux24_base | 47.2 | 26.0 |
| std::ranlux48_base | 24.4 | 26.1 |
| std::ranlux24 | 387.0 | 197.7 |
| std::ranlux48 | 640.5 | 638.4 |
| gsl_ranlxs0 | 84.2 | |
| gsl_ranlxs1 | 125.9 | |
| gsl_ranlxs2 | 215.5 | |
| gsl_ranlxd1 | 213.8 | |
| gsl_ranlxd2 () | 394.0 | |
| gsl_ranlux () | 185.0 | |
| gsl_ranlux389 () | 315.4 | |
| TRandom1 () | 362.7 | |
| RANLUX () | 378.3 | |
| ranlxs (array, SSE, ) | 50.4 | |
| ranlxd (array, SSE, ) | 95.0 | |
| RANLUX++ | 29.2 | 20.0 |
| RANLUX++ (array) | 24.7 | 15.7 |
| scalar | scalar(asm) | SSE2 | AVX2 |
| 40.3 | 28.5 | 12.0 | 7.95 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A revision of the subtract-with-borrow random number generators
Alexei Sibidanov
University of Victoria, Victoria, BC, Canada V8W 3P6
Abstract
The most popular and widely used subtract-with-borrow generator, also known as RANLUX, is reimplemented as a linear congruential generator using large integer arithmetic with the modulus size of 576 bits. Modern computers, as well as the specific structure of the modulus inferred from RANLUX, allow for the development of a fast modular multiplication – the core of the procedure. This was previously believed to be slow and have too high cost in terms of computing resources. Our tests show a significant gain in generation speed which is comparable with other fast, high quality random number generators. An additional feature is the fast skipping of generator states leading to a seeding scheme which guarantees the uniqueness of random number sequences.
keywords:
Linear congruential generator; Subtract-with-borrow generator; RANLUX; GMP;
††journal: Computer Physics Communications
PROGRAM SUMMARY/NEW VERSION PROGRAM SUMMARY
Program Title: RANLUX++
Licensing provisions: GPLv3
Programming language: C++, C, Assembler
1 Introduction
The well known Linear Congruential Generator (LCG) is a recurrent sequence of numbers calculated as follows:
[TABLE]
where is the initial state or seed, – the multiplier, – the increment and – the modulus. The particular choice of the parameters , and with period – the minimal number when , can be found in the literature [1]. Commonly used LCGs are limited to , and have poor statistical properties. Thus they are not used for Monte-Carlo physical simulations.
This situation can be mitigated when reaches several hundreds or even thousand bits. The cost of the increased range of is to deal with arbitrary precision integer arithmetic which was believed to be prohibitively expensive for practical purposes. In the last two decades there has been tremendous progress in modern central processor units (CPU) especially for personal computers (PC) which can be employed for long arithmetic.
We have explored the possibility to use the long arithmetic in LCG to improve the quality of generated random numbers and found that, despite a substantial increase in calculations, the time to generate a single random number is not proportionally risen. In fact for some parameters, the computational time decreased compared to ordinary LCGs with machine word modulus size.
2 Subtract-with-borrow generator
At this point no specific constraints on , and parameters of LCG have been applied. As a good starting point we choose the subtract-with-borrow generator first introduced in [2] and the intimate connection with LCG has been shown as a part of the period calculation. The algorithm has been extensively studied in [3] to improve statistical quality of generated numbers. Based on this study the generator RANLUX [4] was developed and now it is widely used in physics simulations as well as in other fields where random numbers with high statistical quality are required. However the current method employed by RANLUX to achieve the high quality makes it one of the slowest generators on the market.
The definition of the subtract-with-borrow generator is the following: let some integer greater than 1 also called the base and vector with the length , where and or the carry equals 0 or 1. Then define a recursive transformation of the vector with the rule:
[TABLE]
where and and also called the lags. As shown in the work [5], this recursion is equivalent to LCG with the modulus , the multiplier and with the relation:
[TABLE]
In the RANLUX generator the lags and with the base are chosen among other suggested parameters in [2], and thus the modulus is a prime number and the multiplier . With those parameters the period is equal to .
Due to the selected base the natural choice to keep the generator state is a vector of length 24 composed of 24-bit numbers. This implementation uses the properties of the modulus to avoid long arithmetic calculations, and a single step equivalent to one modular multiplication that requires only subtraction of two 24-bit numbers and carry propagation. In the original FORTRAN implementation, 24-bit numbers were stored as floats to avoid at that time, a high cost integer-to-float conversion.
2.1 Remainder
The simple structure of the modulus allows us to calculate the remainder using only additions, subtractions and bit shifts. The modulus and thus the generator state have size of bits and fits into 9 64-bit machine words. The result of the product fits into 18 64-bit machine words which can be represented as a 48 element array of 24-bit numbers: . The number obtained by the procedure shown in Algorithm 1 is congruent to and . Note the product is also only bit shifting due to the simple structure of . The calculation of is a sum of carry bits of each arithmetic operation.
2.2 Skipping
Examining the result of a single step of Eq. 1 one can note that the main part of the number is preserved in its successor which is just rotated by 24 bits. This strong correlation is the reason of the poor statistical quality of the original subtract-with-borrow generator [2]. The bright idea developed in [3] is to apply the transformation (2) many times to break correlations between nearby states before using the state for actual physical simulation. The drawback of this method is obvious – all intermediate states have to be explicitly calculated even if they are not needed. Despite the single step being simple with small resource consumption, good statistical quality requires several hundred steps thus in total, the skipping requires a lot of time. This is a luxury to spend resources and not use the results. Thus so-called luxury levels were introduced as aliases for how many generatated numbers have to be wasted.
Using Eq. 1 we can efficienty skip numbers since all recurrent steps collapse to a single multiplication:
[TABLE]
where the factor is precomputed and thus the cost to calculate the next state with or without skipping is the same. Any state in the entire period can be calculated in no more than long multiplications using fast exponentiation by squaring which takes order of tens of sec on modern CPUs.
In the Table 2.2 the precomputed values of where the values of is taken from [4] are shown for illustrative purposes. In the initial rows, long chains of 0 or 1 in binary representation are clearly visible and this can be interepreted such that for each bit of the state only a few bits of the state contributes. Even at the highest luxury level 4 there are still some patterns observable and a demading user maybe not be completely satisfied. For such user the two last rows would be more attractive especially since it is for free! Such chaotic multipiers mean that if any single bit of the state is changed in the next step the altered state will be absolutely different from the unaltered one.
With explicit long multiplication, there is no need to keep the multiplier as a power of , it can be adjusted to get the full period, . As an example the number is a primitive root modulo and with this multiplier all numbers in the range will appear in the sequence only once with any initial from the same range.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. E. Knuth, The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1997.
- 2[2] G. Marsaglia, A. Zaman, A new class of random number generators , The Annals of Applied Probability 1 (3) (1991) 462–480. URL http://www.jstor.org/stable/2959748
- 3[3] M. Lüscher, A portable high-quality random number generator for lattice field theory simulations , Computer Physics Communications 79 (1) (1994) 100 – 110. doi:10.1016/0010-4655(94)90232-1 . URL http://www.sciencedirect.com/science/article/pii/0010465594902321 · doi ↗
- 4[4] F. James, RANLUX : A Fortran implementation of the high-quality pseudorandom number generator of Lüscher , Computer Physics Communications 79 (1) (1994) 111 – 114. doi:10.1016/0010-4655(94)90233-X . URL http://www.sciencedirect.com/science/article/pii/001046559490233 X · doi ↗
- 5[5] S. Tezuka, P. L’Ecuyer, R. Couture, On the lattice structure of the add-with-carry and subtract-with-borrow random number generators , ACM Trans. Model. Comput. Simul. 3 (4) (1993) 315–331. doi:10.1145/159737.159749 . URL http://doi.acm.org/10.1145/159737.159749 · doi ↗
- 6[6] Gnu multiple precision arithmetic library . URL https://gmplib.org/
- 7[7] Draft: C++ international standard . URL http://www.open-std.org/JTC 1/SC 22/WG 21/docs/papers/2011/n 3242.pdf
- 8[8] GNU scientific library . URL https://www.gnu.org/software/gsl/
