Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Ziyuan Zhang; Darcy Wang; Ningyuan Chen; Rodrigo Mansur; Vahid Sarhangian

arXiv:2505.09901·cs.LG·May 4, 2026

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

Ziyuan Zhang, Darcy Wang, Ningyuan Chen, Rodrigo Mansur, Vahid Sarhangian

PDF

TL;DR

This study compares exploration-exploitation strategies of LLMs and humans using multi-armed bandit experiments, revealing that enabling thinking traces makes LLMs behave more like humans but still limits their adaptability in complex environments.

Contribution

It introduces a comparative analysis of LLMs and humans in decision-making tasks, highlighting how prompting strategies influence LLM behavior and identifying current limitations.

Findings

01

Enabling thinking traces shifts LLM behavior toward human-like exploration.

02

In stationary settings, LLMs match human exploration levels.

03

In non-stationary environments, LLMs struggle with adaptability despite similar regret.

Abstract

Large language models (LLMs) are increasingly used to simulate or automate human behavior in complex sequential decision-making settings. A natural question is then whether LLMs exhibit similar decision-making behavior to humans, and can achieve comparable (or superior) performance. In this work, we focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty. We employ canonical multi-armed bandit (MAB) experiments introduced in the cognitive science and psychiatry literature to conduct a comparative study of the E&E strategies of LLMs, humans, and MAB algorithms. We use interpretable choice models to capture the E&E strategies of the agents and investigate how enabling thinking traces, through both prompting strategies and thinking models, shapes LLM decision-making. We find that enabling thinking in LLMs shifts their behavior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.