Some Supervision Required: Incorporating Oracle Policies in   Reinforcement Learning via Epistemic Uncertainty Metrics

Jun Jet Tai; Jordan K. Terry; Mauro S. Innocente; James Brusey; Nadjim; Horri

arXiv:2208.10533·cs.LG·August 22, 2023·1 cites

Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

Jun Jet Tai, Jordan K. Terry, Mauro S. Innocente, James Brusey, Nadjim, Horri

PDF

Open Access

TL;DR

This paper introduces CCGE, a method that uses epistemic uncertainty to effectively incorporate oracle policies into reinforcement learning, improving sample efficiency and performance especially in sparse reward environments.

Contribution

Proposes Critic Confidence Guided Exploration (CCGE), a novel approach that adaptively integrates oracle policies into actor-critic RL based on uncertainty estimates.

Findings

01

CCGE improves sample efficiency across benchmark tasks.

02

It performs well in sparse reward environments.

03

Effectiveness shown with multiple uncertainty estimation techniques.

Abstract

An inherent problem of reinforcement learning is performing exploration of an environment through random actions, of which a large portion can be unproductive. Instead, exploration can be improved by initializing the learning policy with an existing (previously learned or hard-coded) oracle policy, offline data, or demonstrations. In the case of using an oracle policy, it can be unclear how best to incorporate the oracle policy's experience into the learning policy in a way that maximizes learning sample efficiency. In this paper, we propose a method termed Critic Confidence Guided Exploration (CCGE) for incorporating such an oracle policy into standard actor-critic reinforcement learning algorithms. More specifically, CCGE takes in the oracle policy's actions as suggestions and incorporates this information into the learning scheme when uncertainty is high, while ignoring it when the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

Methods1x1 Convolution · Global Average Pooling · Dilated Convolution · Convolution · Average Pooling · Switchable Atrous Convolution