Accelerating Reinforcement Learning with Suboptimal Guidance

Eivind B{\o}hn; Signe Moe; Tor Arne Johansen

arXiv:1911.09391·cs.LG·November 22, 2019

Accelerating Reinforcement Learning with Suboptimal Guidance

Eivind B{\o}hn, Signe Moe, Tor Arne Johansen

PDF

TL;DR

This paper introduces an adaptive guidance method for reinforcement learning that leverages a Q-filter to incorporate suboptimal controllers, improving exploration efficiency and overall performance in sparse reward environments.

Contribution

It identifies shortcomings in existing Q-filter implementations and proposes modifications that enhance adaptivity and performance in robotic reinforcement learning tasks.

Findings

01

Modified Q-filter improves guidance adaptivity

02

Enhanced exploration accelerates learning in sparse rewards

03

Performance increases across tested robotic environments

Abstract

Reinforcement Learning in domains with sparse rewards is a difficult problem, and a large part of the training process is often spent searching the state space in a more or less random fashion for any learning signals. For control problems, we often have some controller readily available which might be suboptimal but nevertheless solves the problem to some degree. This controller can be used to guide the initial exploration phase of the learning controller towards reward yielding states, reducing the time before refinement of a viable policy can be initiated. In our work, the agent is guided through an auxiliary behaviour cloning loss which is made conditional on a Q-filter, i.e. it is only applied in situations where the critic deems the guiding controller to be better than the agent. The Q-filter provides a natural way to adjust the guidance throughout the training process, allowing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.