Should You Use Your Large Language Model to Explore or Exploit?

Keegan Harris; Aleksandrs Slivkins

arXiv:2502.00225·cs.LG·February 18, 2026

Should You Use Your Large Language Model to Explore or Exploit?

Keegan Harris, Aleksandrs Slivkins

PDF

Open Access

TL;DR

This paper systematically evaluates large language models' capabilities in exploration and exploitation tasks, revealing their strengths in reasoning for exploitation and their utility in exploration, but also highlighting their limitations compared to simpler models.

Contribution

It provides a comprehensive analysis of LLMs' performance in exploration and exploitation, introducing a siloed evaluation approach and assessing tool use and in-context summarization effects.

Findings

01

Reasoning models excel at exploitation tasks but are slow and costly.

02

Non-reasoning models benefit from tool use and summarization, improving medium-difficulty task performance.

03

LLMs outperform simple linear regression in exploration of large, semantically rich action spaces.

Abstract

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. While previous work has largely study the ability of LLMs to solve combined exploration-exploitation tasks, we take a more systematic approach and use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that reasoning models show the most promise for solving exploitation tasks, although they are still too expensive or too slow to be used in many practical settings. Motivated by this, we study tool use and in-context summarization using non-reasoning models. We find that these mitigations may be used to substantially improve performance on medium-difficulty tasks, however even then, all LLMs we study perform worse than a simple linear regression, even in non-linear settings. On the other hand, we find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Topic Modeling · Network Security and Intrusion Detection