Exploitation Is All You Need... for Exploration
Micah Rentschler, Jesse Roberts

TL;DR
This paper shows that in certain structured environments, agents trained solely to maximize immediate rewards can naturally develop exploratory behaviors without explicit exploration incentives, challenging traditional exploration-exploitation paradigms.
Contribution
It demonstrates that emergent exploration can arise from greedy training under specific conditions like environment structure, memory, and long-term credit assignment.
Findings
Emergent exploration occurs when environment structure and memory are present.
Removing environment structure or memory eliminates emergent exploration.
Long-horizon credit assignment is not always necessary for exploration.
Abstract
Ensuring sufficient exploration is a central challenge when training meta-reinforcement learning (meta-RL) agents to solve novel environments. Conventional solutions to the exploration-exploitation dilemma inject explicit incentives such as randomization, uncertainty bonuses, or intrinsic rewards to encourage exploration. In this work, we hypothesize that an agent trained solely to maximize a greedy (exploitation-only) objective can nonetheless exhibit emergent exploratory behavior, provided three conditions are met: (1) Recurring Environmental Structure, where the environment features repeatable regularities that allow past experience to inform future choices; (2) Agent Memory, enabling the agent to retain and utilize historical interaction data; and (3) Long-Horizon Credit Assignment, where learning propagates returns over a time frame sufficient for the delayed benefits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Experimental Behavioral Economics Studies
