Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments
Ziyuan Zhang, Darcy Wang, Ningyuan Chen, Rodrigo Mansur, Vahid Sarhangian

TL;DR
This study compares exploration-exploitation strategies of LLMs and humans using multi-armed bandit experiments, revealing that enabling thinking traces makes LLMs behave more like humans but still limits their adaptability in complex environments.
Contribution
It introduces a comparative analysis of LLMs and humans in decision-making tasks, highlighting how prompting strategies influence LLM behavior and identifying current limitations.
Findings
Enabling thinking traces shifts LLM behavior toward human-like exploration.
In stationary settings, LLMs match human exploration levels.
In non-stationary environments, LLMs struggle with adaptability despite similar regret.
Abstract
Large language models (LLMs) are increasingly used to simulate or automate human behavior in complex sequential decision-making settings. A natural question is then whether LLMs exhibit similar decision-making behavior to humans, and can achieve comparable (or superior) performance. In this work, we focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty. We employ canonical multi-armed bandit (MAB) experiments introduced in the cognitive science and psychiatry literature to conduct a comparative study of the E&E strategies of LLMs, humans, and MAB algorithms. We use interpretable choice models to capture the E&E strategies of the agents and investigate how enabling thinking traces, through both prompting strategies and thinking models, shapes LLM decision-making. We find that enabling thinking in LLMs shifts their behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
