Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations
Albert Wilcox, Ashwin Balakrishna, Jules Dedieu, Wyame Benslimane,, Daniel S. Brown, Ken Goldberg

TL;DR
This paper introduces MCAC, a simple yet effective modification to actor-critic algorithms that enhances exploration and learning efficiency in sparse reward RL tasks by leveraging demonstrations and Monte Carlo estimates.
Contribution
MCAC is a parameter-free enhancement to standard actor-critic methods that initializes with demonstrations and combines TD and Monte Carlo estimates for better exploration.
Findings
MCAC significantly improves learning efficiency in sparse reward environments.
It outperforms existing RL and RL-from-demonstrations algorithms across multiple control domains.
The method is simple to implement and tune, with broad applicability.
Abstract
Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, prior RL from demonstrations algorithms introduce significant complexity and many hyperparameters, making them hard to implement and tune. We introduce Monte Carlo Augmented Actor Critic (MCAC), a parameter free modification to standard actor-critic algorithms which initializes the replay buffer with demonstrations and computes a modified -value by taking the maximum of the standard temporal distance (TD) target and a Monte Carlo estimate of the reward-to-go. This encourages exploration in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
