Random problems with R
Kellie Ottoboni, Philip B. Stark

TL;DR
This paper identifies a bias in R's random sampling method due to quantization effects, and proposes a fix by generating random integers directly from random bits, improving sampling uniformity.
Contribution
It highlights the bias in R's current random integer generation and introduces a simple, effective method to produce unbiased random samples using random bits.
Findings
R's current method causes non-uniform distributions in sampling.
Using random bits for integer generation reduces bias.
Python's numpy.random.randint() employs this improved approach.
Abstract
R (Version 3.5.1 patched) has an issue with its random sampling functionality. R generates random integers between and by multiplying random floats by , taking the floor, and adding to the result. Well-known quantization effects in this approach result in a non-uniform distribution on . The difference, which depends on , can be substantial. Because the sample function in R relies on generating random integers, random sampling in R is biased. There is an easy fix: construct random integers directly from random bits, rather than multiplying a random float by . That is the strategy taken in Python's numpy.random.randint() function, among others. Example source code in Python is available at https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py (see functions getrandbits() and randbelow_from_randbits()).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChaos-based Image/Signal Encryption · Algorithms and Data Compression · Cellular Automata and Applications
