Exploration Unbound
Dilip Arumugam, Wanqiao Xu, Benjamin Van Roy

TL;DR
This paper examines a complex environment where an agent benefits indefinitely from exploration, challenging traditional notions that optimal strategies eventually favor exploitation as knowledge accumulates.
Contribution
It introduces a simple yet profound example of an environment where continuous exploration remains optimal due to unbounded rewards and perpetual learning benefits.
Findings
Optimal exploration persists indefinitely in the environment.
Rewards increase unboundedly with ongoing exploration.
Traditional exploitation strategies are suboptimal in this setting.
Abstract
A sequential decision-making agent balances between exploring to gain new knowledge about an environment and exploiting current knowledge to maximize immediate reward. For environments studied in the traditional literature, optimal decisions gravitate over time toward exploitation as the agent accumulates sufficient knowledge and the benefits of further exploration vanish. What if, however, the environment offers an unlimited amount of useful knowledge and there is large benefit to further exploration no matter how much the agent has learned? We offer a simple, quintessential example of such a complex environment. In this environment, rewards are unbounded and an agent can always increase the rate at which rewards accumulate by exploring to learn more. Consequently, an optimal agent forever maintains a propensity to explore.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Game Theory and Applications · Advanced Bandit Algorithms Research
