SFP: State-free Priors for Exploration in Off-Policy Reinforcement   Learning

Marco Bagatella; Sammy Christen; Otmar Hilliges

arXiv:2205.13528·cs.LG·September 1, 2022

SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning

Marco Bagatella, Sammy Christen, Otmar Hilliges

PDF

Open Access

TL;DR

This paper introduces state-free priors that leverage offline data to improve exploration in off-policy reinforcement learning, especially when tasks differ significantly from the demonstration data, leading to faster learning in complex, sparse reward environments.

Contribution

The work proposes a novel state-free prior method that models temporal consistency in trajectories and a new integration scheme for action priors in off-policy RL.

Findings

01

Accelerates RL in long-horizon tasks

02

Effective with diverse offline data

03

Outperforms strong baselines

Abstract

Efficient exploration is a crucial challenge in deep reinforcement learning. Several methods, such as behavioral priors, are able to leverage offline data in order to efficiently accelerate reinforcement learning on complex tasks. However, if the task at hand deviates excessively from the demonstrated task, the effectiveness of such methods is limited. In our work, we propose to learn features from offline data that are shared by a more diverse range of tasks, such as correlation between actions and directedness. Therefore, we introduce state-free priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks. Furthermore, we introduce a novel integration scheme for action priors in off-policy reinforcement learning by dynamically sampling actions from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Adversarial Robustness in Machine Learning