Exploration and Exploitation Errors Are Measurable for Language Model Agents

Jaden Park; Jungtaek Kim; Jongwon Jeong; Robert D. Nowak; Kangwook Lee; Yong Jae Lee

arXiv:2604.13151·cs.AI·April 16, 2026

Exploration and Exploitation Errors Are Measurable for Language Model Agents

Jaden Park, Jungtaek Kim, Jongwon Jeong, Robert D. Nowak, Kangwook Lee, Yong Jae Lee

PDF

1 Repo

TL;DR

This paper introduces controllable environments and metrics to measure exploration and exploitation errors in language model agents, revealing current models' limitations and potential improvements.

Contribution

It presents a novel evaluation framework with adjustable difficulty and a metric for quantifying exploration and exploitation errors in LM agents.

Findings

01

State-of-the-art models struggle with the tasks in the new environments.

02

Reasoning models perform better than other models.

03

Minimal engineering can significantly improve exploration and exploitation.

Abstract

Language Model (LM) agents are increasingly used in complex open-ended decision-making tasks, from AI coding to physical AI. A core requirement in these settings is the ability to both explore the problem space and exploit acquired knowledge effectively. However, systematically distinguishing and quantifying exploration and exploitation from observed actions without access to the agent's internal policy remains challenging. To address this, we design controllable environments inspired by practical embodied AI scenarios. Each environment consists of a partially observable 2D grid map and an unknown task Directed Acyclic Graph (DAG). The map generation can be programmatically adjusted to emphasize exploration or exploitation difficulty. To enable policy-agnostic evaluation, we design a metric to quantify exploration and exploitation errors from agent's actions. We evaluate a variety of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jjj-madison/measurable-explore-exploit
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.