Learning to maintain safety through expert demonstrations in settings with unknown constraints: A Q-learning perspective
George Papadopoulos, George A. Vouros

TL;DR
This paper introduces SafeQIL, a safe Q-learning based inverse reinforcement learning method that learns policies from expert demonstrations in constrained environments, balancing reward maximization with safety considerations.
Contribution
It proposes a novel safe Q-learning framework for inverse constrained reinforcement learning, effectively incorporating safety into the learning process from demonstrations.
Findings
SafeQIL outperforms existing inverse constraint RL algorithms on benchmark tasks.
The method effectively balances reward maximization and safety.
SafeQIL demonstrates robustness in unknown constraint settings.
Abstract
Given a set of trajectories demonstrating the execution of a task safely in a constrained MDP with observable rewards but with unknown constraints and non-observable costs, we aim to find a policy that maximizes the likelihood of demonstrated trajectories trading the balance between being conservative and increasing significantly the likelihood of high-rewarding trajectories but with potentially unsafe steps. Having these objectives, we aim towards learning a policy that maximizes the probability of the most trajectories with respect to the demonstrations. In so doing, we formulate the ``promise" of individual state-action pairs in terms of values, which depend on task-specific rewards as well as on the assessment of states' safety, mixing expectations in terms of rewards and safety. This entails a safe Q-learning perspective of the inverse learning problem under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning
