Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation
Mohamad H. Danesh, Maxime Wabartha, Stanley Wu, Joelle Pineau, Hsiu-Chin Lin

TL;DR
This paper introduces UARL, a safe reinforcement learning framework that uses uncertainty estimation and progressive environment randomization to improve policy robustness and safety without direct target domain interactions.
Contribution
UARL is a novel approach that combines ensemble critic uncertainty with progressive domain randomization for safe, out-of-distribution policy adaptation in RL.
Findings
UARL effectively detects OOD states with high accuracy.
UARL outperforms baselines in robustness and sample efficiency.
UARL demonstrates successful transfer to real-world robot tasks.
Abstract
Deploying reinforcement learning (RL) policies in real-world involves significant challenges, including distribution shifts, safety concerns, and the impracticality of direct interactions during policy refinement. Existing methods, such as domain randomization (DR) and off-dynamics RL, enhance policy robustness by direct interaction with the target domain, an inherently unsafe practice. We propose Uncertainty-Aware RL (UARL), a novel framework that prioritizes safety during training by addressing Out-Of-Distribution (OOD) detection and policy adaptation without requiring direct interactions in target domain. UARL employs an ensemble of critics to quantify policy uncertainty and incorporates progressive environmental randomization to prepare the policy for diverse real-world conditions. By iteratively refining over high-uncertainty regions of the state space in simulated environments,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
