EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration

Allen Nie; Yi Su; Bo Chang; Jonathan N. Lee; Ed H. Chi; Quoc V. Le; Minmin Chen

arXiv:2410.06238·cs.LG·July 15, 2025

EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration

Allen Nie, Yi Su, Bo Chang, Jonathan N. Lee, Ed H. Chi, Quoc V. Le, Minmin Chen

PDF

Open Access

TL;DR

This paper evaluates the decision-making and exploration capabilities of large language models in bandit settings, proposing methods to enhance their performance through algorithm integration and demonstrating improved results over larger models.

Contribution

It introduces a comprehensive benchmarking suite for LLMs in exploration tasks and proposes techniques to incorporate optimal exploration algorithms into LLMs, improving their decision-making abilities.

Findings

01

Smaller models can outperform larger ones with algorithm-guided support.

02

Explicit algorithm integration improves exploration efficiency.

03

Task difficulty and data representation significantly affect exploration performance.

Abstract

Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. We develop a comprehensive suite of environments, including both context-free and contextual bandits with varying task difficulties, to benchmark LLMs' performance. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs: by providing explicit algorithm-guided support during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Reservoir Engineering and Simulation Methods