Distilled Domain Randomization
Julien Brosseit, Benedikt Hahner, Fabio Muratore, Michael Gienger, Jan, Peters

TL;DR
This paper introduces DiDoR, a method combining reinforcement learning with policy distillation to effectively transfer controllers from simulation to real robots without additional real-world data.
Contribution
It proposes a novel algorithm that distills policies from multiple simulated domains into a single robust policy for sim-to-real transfer.
Findings
DiDoR achieves comparable or better performance than baselines in various experiments.
The method does not increase memory or computation time for deployment.
DiDoR effectively bridges the reality gap in robot control policies.
Abstract
Deep reinforcement learning is an effective tool to learn robot control policies from scratch. However, these methods are notorious for the enormous amount of required training data which is prohibitively expensive to collect on real robots. A highly popular alternative is to learn from simulations, allowing to generate the data much faster, safer, and cheaper. Since all simulators are mere models of reality, there are inevitable differences between the simulated and the real data, often referenced as the 'reality gap'. To bridge this gap, many approaches learn one policy from a distribution over simulators. In this paper, we propose to combine reinforcement learning from randomized physics simulations with policy distillation. Our algorithm, called Distilled Domain Randomization (DiDoR), distills so-called teacher policies, which are experts on domains that have been sampled initially,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
