How Random is Random? Evaluating the Randomness and Humaness of LLMs' Coin Flips
Katherine Van Koevering, Jon Kleinberg

TL;DR
This paper investigates how large language models generate binary sequences, revealing that GPT 4 and Llama 3 show human-like biases while GPT 3.5 behaves more randomly, raising questions about the nature of randomness and humanness in AI.
Contribution
The study provides a comparative analysis of LLMs' ability to produce random sequences, highlighting differences in bias and randomness among GPT models and Llama 3.
Findings
GPT 4 and Llama 3 exhibit human biases in randomness tasks
GPT 3.5 demonstrates more random, less biased behavior
The dichotomy raises questions about the utility of human-like versus random outputs
Abstract
One uniquely human trait is our inability to be random. We see and produce patterns where there should not be any and we do so in a predictable way. LLMs are supplied with human data and prone to human biases. In this work, we explore how LLMs approach randomness and where and how they fail through the lens of the well studied phenomena of generating binary random sequences. We find that GPT 4 and Llama 3 exhibit and exacerbate nearly every human bias we test in this context, but GPT 3.5 exhibits more random behavior. This dichotomy of randomness or humaness is proposed as a fundamental question of LLMs and that either behavior may be useful in different circumstances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArt History and Market Analysis
