Safe Reinforcement Learning with Minimal Supervision
Alexander Quessy, Thomas Richardson, Sebastian East

TL;DR
This paper introduces an unsupervised offline data collection method for safe reinforcement learning, emphasizing the importance of data quality and quantity for effective online safe policy learning, especially in complex tasks with limited demonstrations.
Contribution
It proposes an unsupervised RL-based data collection method and a novel online safe-RL algorithm called optimistic forgetting, addressing data scarcity issues.
Findings
Sufficient demonstrations are crucial for optimal safe-RL policies.
Unsupervised data collection can effectively learn complex policies without hand-designed controllers.
Balancing diversity and optimality in data improves safe exploration.
Abstract
Reinforcement learning (RL) in the real world necessitates the development of procedures that enable agents to explore without causing harm to themselves or others. The most successful solutions to the problem of safe RL leverage offline data to learn a safe-set, enabling safe online exploration. However, this approach to safe-learning is often constrained by the demonstrations that are available for learning. In this paper we investigate the influence of the quantity and quality of data used to train the initial safe learning problem offline on the ability to learn safe-RL policies online. Specifically, we focus on tasks with spatially extended goal states where we have few or no demonstrations available. Classically this problem is addressed either by using hand-designed controllers to generate data or by collecting user-generated demonstrations. However, these methods are often…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEEG and Brain-Computer Interfaces · Reinforcement Learning in Robotics · Anomaly Detection Techniques and Applications
MethodsFocus
