Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement   Learning with Domain Randomization

Yuki Kadokawa; Lingwei Zhu; Yoshihisa Tsurumine; Takamitsu Matsubara

arXiv:2207.14561·cs.RO·April 11, 2023

Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization

Yuki Kadokawa, Lingwei Zhu, Yoshihisa Tsurumine, Takamitsu Matsubara

PDF

Open Access 1 Repo

TL;DR

Cyclic Policy Distillation (CPD) enhances sample efficiency in sim-to-real reinforcement learning by dividing parameter ranges into sub-domains, learning local policies cyclically, and distilling them into a global policy, demonstrated on multiple tasks.

Contribution

The paper introduces CPD, a novel method that improves sample efficiency in domain randomization for reinforcement learning by cyclically learning local policies and distilling them into a global policy.

Findings

01

CPD significantly reduces sample requirements in simulations.

02

CPD achieves effective sim-to-real transfer on robotic tasks.

03

The method outperforms baseline approaches in various benchmarks.

Abstract

Deep reinforcement learning with domain randomization learns a control policy in various simulations with randomized physical and sensor model parameters to become transferable to the real world in a zero-shot setting. However, a huge number of samples are often required to learn an effective policy when the range of randomized parameters is extensive due to the instability of policy updates. To alleviate this problem, we propose a sample-efficient method named cyclic policy distillation (CPD). CPD divides the range of randomized parameters into several small sub-domains and assigns a local policy to each one. Then local policies are learned while cyclically transitioning to sub-domains. CPD accelerates learning through knowledge transfer based on expected performance improvements. Finally, all of the learned local policies are distilled into a global policy for sim-to-real transfers.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuki-kadokawa/cyclic-policy-distillation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning