SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning
Marco Bagatella, Sammy Christen, Otmar Hilliges

TL;DR
This paper introduces state-free priors that leverage offline data to improve exploration in off-policy reinforcement learning, especially when tasks differ significantly from the demonstration data, leading to faster learning in complex, sparse reward environments.
Contribution
The work proposes a novel state-free prior method that models temporal consistency in trajectories and a new integration scheme for action priors in off-policy RL.
Findings
Accelerates RL in long-horizon tasks
Effective with diverse offline data
Outperforms strong baselines
Abstract
Efficient exploration is a crucial challenge in deep reinforcement learning. Several methods, such as behavioral priors, are able to leverage offline data in order to efficiently accelerate reinforcement learning on complex tasks. However, if the task at hand deviates excessively from the demonstrated task, the effectiveness of such methods is limited. In our work, we propose to learn features from offline data that are shared by a more diverse range of tasks, such as correlation between actions and directedness. Therefore, we introduce state-free priors, which directly model temporal consistency in demonstrated trajectories, and are capable of driving exploration in complex tasks, even when trained on data collected on simpler tasks. Furthermore, we introduce a novel integration scheme for action priors in off-policy reinforcement learning by dynamically sampling actions from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Adversarial Robustness in Machine Learning
