Bounded Rationality in Las Vegas: Probabilistic Finite Automata PlayMulti-Armed Bandits
Xinming Liu, Joseph Y. Halpern

TL;DR
This paper models human decision-making in multi-armed bandit problems using probabilistic finite automata, demonstrating that limited computational capacity can produce near-optimal and human-like behaviors with biases.
Contribution
It introduces a simple PFA-based strategy for MABs, showing how limited computational resources can explain human-like biases and near-optimal performance.
Findings
PFA strategies perform near-optimally with many states
Performance degrades gracefully with fewer states
PFA exhibits human-like biases such as optimism and negativity
Abstract
While traditional economics assumes that humans are fully rational agents who always maximize their expected utility, in practice, we constantly observe apparently irrational behavior. One explanation is that people have limited computational power, so that they are, quite rationally, making the best decisions they can, given their computational limitations. To test this hypothesis, we consider the multi-armed bandit (MAB) problem. We examine a simple strategy for playing an MAB that can be implemented easily by a probabilistic finite automaton (PFA). Roughly speaking, the PFA sets certain expectations, and plays an arm as long as it meets them. If the PFA has sufficiently many states, it performs near-optimally. Its performance degrades gracefully as the number of states decreases. Moreover, the PFA acts in a "human-like" way, exhibiting a number of standard human biases, like an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Applications
