Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated   Exploration

Lukas Sch\"afer; Filippos Christianos; Josiah P. Hanna; Stefano V.; Albrecht

arXiv:2107.08966·cs.LG·February 10, 2022·5 cites

Decoupled Reinforcement Learning to Stabilise Intrinsically-Motivated Exploration

Lukas Sch\"afer, Filippos Christianos, Josiah P. Hanna, Stefano V., Albrecht

PDF

Open Access 1 Repo

TL;DR

This paper introduces Decoupled RL, a framework that trains separate policies for exploration and exploitation in reinforcement learning, improving robustness and sample efficiency when using intrinsic rewards.

Contribution

Decoupled RL is a novel framework that separates exploration and exploitation policies, addressing instability issues caused by intrinsic reward shaping.

Findings

01

DeRL is more robust to intrinsic reward scale and decay.

02

DeRL converges faster to evaluation returns than baseline methods.

03

Divergence constraint regularisers reduce instability from policy divergence.

Abstract

Intrinsic rewards can improve exploration in reinforcement learning, but the exploration process may suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we introduce Decoupled RL (DeRL) as a general framework which trains separate policies for intrinsically-motivated exploration and exploitation. Such decoupling allows DeRL to leverage the benefits of intrinsic rewards for exploration while demonstrating improved robustness and sample efficiency. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. Our results show that DeRL is more robust to varying scale and rate of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically-motivated baselines in fewer interactions. Lastly, we discuss the challenge of distribution shift and show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uoe-agents/derl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics