Accelerating Reinforcement Learning with Suboptimal Guidance
Eivind B{\o}hn, Signe Moe, Tor Arne Johansen

TL;DR
This paper introduces an adaptive guidance method for reinforcement learning that leverages a Q-filter to incorporate suboptimal controllers, improving exploration efficiency and overall performance in sparse reward environments.
Contribution
It identifies shortcomings in existing Q-filter implementations and proposes modifications that enhance adaptivity and performance in robotic reinforcement learning tasks.
Findings
Modified Q-filter improves guidance adaptivity
Enhanced exploration accelerates learning in sparse rewards
Performance increases across tested robotic environments
Abstract
Reinforcement Learning in domains with sparse rewards is a difficult problem, and a large part of the training process is often spent searching the state space in a more or less random fashion for any learning signals. For control problems, we often have some controller readily available which might be suboptimal but nevertheless solves the problem to some degree. This controller can be used to guide the initial exploration phase of the learning controller towards reward yielding states, reducing the time before refinement of a viable policy can be initiated. In our work, the agent is guided through an auxiliary behaviour cloning loss which is made conditional on a Q-filter, i.e. it is only applied in situations where the critic deems the guiding controller to be better than the agent. The Q-filter provides a natural way to adjust the guidance throughout the training process, allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
