Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics
Jun Jet Tai, Jordan K. Terry, Mauro S. Innocente, James Brusey, Nadjim, Horri

TL;DR
This paper introduces CCGE, a method that uses epistemic uncertainty to effectively incorporate oracle policies into reinforcement learning, improving sample efficiency and performance especially in sparse reward environments.
Contribution
Proposes Critic Confidence Guided Exploration (CCGE), a novel approach that adaptively integrates oracle policies into actor-critic RL based on uncertainty estimates.
Findings
CCGE improves sample efficiency across benchmark tasks.
It performs well in sparse reward environments.
Effectiveness shown with multiple uncertainty estimation techniques.
Abstract
An inherent problem of reinforcement learning is performing exploration of an environment through random actions, of which a large portion can be unproductive. Instead, exploration can be improved by initializing the learning policy with an existing (previously learned or hard-coded) oracle policy, offline data, or demonstrations. In the case of using an oracle policy, it can be unclear how best to incorporate the oracle policy's experience into the learning policy in a way that maximizes learning sample efficiency. In this paper, we propose a method termed Critic Confidence Guided Exploration (CCGE) for incorporating such an oracle policy into standard actor-critic reinforcement learning algorithms. More specifically, CCGE takes in the oracle policy's actions as suggestions and incorporates this information into the learning scheme when uncertainty is high, while ignoring it when the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
Methods1x1 Convolution · Global Average Pooling · Dilated Convolution · Convolution · Average Pooling · Switchable Atrous Convolution
