Satisficing Exploration for Deep Reinforcement Learning

Dilip Arumugam; Saurabh Kumar; Ramki Gummadi; Benjamin Van Roy

arXiv:2407.12185·cs.LG·July 23, 2024

Satisficing Exploration for Deep Reinforcement Learning

Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper introduces a deep reinforcement learning approach that enables agents to efficiently learn satisficing policies by directly representing uncertainty over the value function, bypassing model-based planning.

Contribution

It extends existing satisficing exploration methods to deep RL by removing the need for model-based planning, allowing for efficient learning in high-dimensional environments.

Findings

01

Enables deep RL agents to learn satisficing behaviors effectively.

02

Achieves more efficient synthesis of optimal behaviors when feasible.

03

Demonstrates the approach with simple experiments.

Abstract

A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics