Large Language Models are Biased Reinforcement Learners
William M. Hayes, Nicolas Yax, Stefano Palminteri

TL;DR
This paper investigates how large language models perform reinforcement learning tasks, revealing they exhibit relative value biases similar to humans, which impacts their decision-making capabilities and generalization.
Contribution
It demonstrates that LLMs encode relative values in reinforcement learning tasks and that these biases influence their performance and generalization, with evidence from multiple models and tasks.
Findings
LLMs show behavioral signatures of relative value bias.
Adding explicit outcome comparisons affects performance differently.
Biases are present in both fine-tuned and pretrained models.
Abstract
In-context learning enables large language models (LLMs) to perform a variety of tasks, including learning to make reward-maximizing choices in simple bandit tasks. Given their potential use as (autonomous) decision-making agents, it is important to understand how these models perform such reinforcement learning (RL) tasks and the extent to which they are susceptible to biases. Motivated by the fact that, in humans, it has been widely documented that the value of an outcome depends on how it compares to other local outcomes, the present study focuses on whether similar value encoding biases apply to how LLMs encode rewarding outcomes. Results from experiments with multiple bandit tasks and models show that LLMs exhibit behavioral signatures of a relative value bias. Adding explicit outcome comparisons to the prompt produces opposing effects on performance, enhancing maximization in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
