EVOLvE: Evaluating and Optimizing LLMs For In-Context Exploration
Allen Nie, Yi Su, Bo Chang, Jonathan N. Lee, Ed H. Chi, Quoc V. Le, Minmin Chen

TL;DR
This paper evaluates the decision-making and exploration capabilities of large language models in bandit settings, proposing methods to enhance their performance through algorithm integration and demonstrating improved results over larger models.
Contribution
It introduces a comprehensive benchmarking suite for LLMs in exploration tasks and proposes techniques to incorporate optimal exploration algorithms into LLMs, improving their decision-making abilities.
Findings
Smaller models can outperform larger ones with algorithm-guided support.
Explicit algorithm integration improves exploration efficiency.
Task difficulty and data representation significantly affect exploration performance.
Abstract
Despite their success in many domains, large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. This is crucial as many real-world applications, ranging from personalized recommendations to healthcare interventions, demand that LLMs not only predict but also actively learn to make optimal decisions through exploration. In this work, we measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. We develop a comprehensive suite of environments, including both context-free and contextual bandits with varying task difficulties, to benchmark LLMs' performance. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs: by providing explicit algorithm-guided support during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Reservoir Engineering and Simulation Methods
