Satisficing Exploration for Deep Reinforcement Learning
Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy

TL;DR
This paper introduces a deep reinforcement learning approach that enables agents to efficiently learn satisficing policies by directly representing uncertainty over the value function, bypassing model-based planning.
Contribution
It extends existing satisficing exploration methods to deep RL by removing the need for model-based planning, allowing for efficient learning in high-dimensional environments.
Findings
Enables deep RL agents to learn satisficing behaviors effectively.
Achieves more efficient synthesis of optimal behaviors when feasible.
Demonstrates the approach with simple experiments.
Abstract
A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
