Better Exploration with Optimistic Actor-Critic
Kamil Ciosek, Quan Vuong, Robert Loftin, Katja Hofmann

TL;DR
This paper introduces Optimistic Actor Critic, a new reinforcement learning algorithm that improves exploration efficiency by using confidence bounds, leading to state-of-the-art performance in continuous control tasks.
Contribution
The paper proposes a novel algorithm, Optimistic Actor Critic, which incorporates confidence bounds to enhance exploration in actor-critic methods, addressing key limitations of existing algorithms.
Findings
Achieves state-of-the-art sample efficiency in continuous control tasks.
Addresses pessimistic underexploration by using confidence bounds.
Demonstrates improved exploration through directed action sampling.
Abstract
Actor-critic methods, a type of model-free Reinforcement Learning, have been successfully applied to challenging tasks in continuous control, often achieving state-of-the art performance. However, wide-scale adoption of these methods in real-world domains is made difficult by their poor sample efficiency. We address this problem both theoretically and empirically. On the theoretical side, we identify two phenomena preventing efficient exploration in existing state-of-the-art algorithms such as Soft Actor Critic. First, combining a greedy actor update with a pessimistic estimate of the critic leads to the avoidance of actions that the agent does not know about, a phenomenon we call pessimistic underexploration. Second, current algorithms are directionally uninformed, sampling actions with equal probability in opposite directions from the current mean. This is wasteful, since we typically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)
MethodsAdam · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Dense Connections · Soft Actor Critic
