Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization
Yuki Kadokawa, Lingwei Zhu, Yoshihisa Tsurumine, Takamitsu Matsubara

TL;DR
Cyclic Policy Distillation (CPD) enhances sample efficiency in sim-to-real reinforcement learning by dividing parameter ranges into sub-domains, learning local policies cyclically, and distilling them into a global policy, demonstrated on multiple tasks.
Contribution
The paper introduces CPD, a novel method that improves sample efficiency in domain randomization for reinforcement learning by cyclically learning local policies and distilling them into a global policy.
Findings
CPD significantly reduces sample requirements in simulations.
CPD achieves effective sim-to-real transfer on robotic tasks.
The method outperforms baseline approaches in various benchmarks.
Abstract
Deep reinforcement learning with domain randomization learns a control policy in various simulations with randomized physical and sensor model parameters to become transferable to the real world in a zero-shot setting. However, a huge number of samples are often required to learn an effective policy when the range of randomized parameters is extensive due to the instability of policy updates. To alleviate this problem, we propose a sample-efficient method named cyclic policy distillation (CPD). CPD divides the range of randomized parameters into several small sub-domains and assigns a local policy to each one. Then local policies are learned while cyclically transitioning to sub-domains. CPD accelerates learning through knowledge transfer based on expected performance improvements. Finally, all of the learned local policies are distilled into a global policy for sim-to-real transfers.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning
