Agents Explore but Agents Ignore: LLMs Lack Environmental Curiosity
Leon Engl\"ander, Sophia Althammer, Ahmet \"Ust\"un, Matthias Gall\'e, Tom Sherborne

TL;DR
This paper demonstrates that current LLM-based agents lack environmental curiosity, failing to exploit unexpected but relevant information, which limits their ability to adapt and maximize utility in various tasks.
Contribution
It reveals the gap between discovering and exploiting solutions in LLM agents and identifies factors affecting environmental curiosity, proposing areas for improvement.
Findings
Agents discover solutions in 79-81% of Terminal-Bench runs
Agents exploit solutions in only 37-50% of Terminal-Bench runs
Agents rarely exploit solutions in AppWorld despite clear documentation
Abstract
LLM-based agents are assumed to integrate environmental observations into their reasoning: discovering highly relevant but unexpected information should naturally lead to a model exploiting its own discoveries. We show that this assumption is false for current LLM-based agents, which struggle to reflect or react to unexpected information. Across three benchmarks (Terminal-Bench, SWE-Bench, AppWorld), we inject complete task solutions into the agent environments to deliberately expose a task's solution to a model. While agents discover these solutions on Terminal-Bench in 79-81% of runs, they interact, or exploit, them in only 37-50% of cases. This gap is starkest in AppWorld: agents see documentation stating that a command "returns the complete solution to this task" in over 90% of attempts but exploit this in fewer than 7% of trials. We show that agents lack what we call environmental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
