TL;DR
This paper demonstrates that randomizing kinematic parameters during simulation training improves the transfer of reinforcement learning policies to real robots, and introduces a new adaptation algorithm based on this insight.
Contribution
The authors reveal the effectiveness of kinematic parameter randomization over dynamic randomization and propose Multi-Policy Bayesian Optimization for efficient policy adaptation.
Findings
Kinematic randomization outperforms dynamic randomization in policy transfer.
The proposed algorithm adapts policies with limited target environment data.
Experiments on a quadruped robot validate the approach across diverse environments.
Abstract
Transferring reinforcement learning policies trained in physics simulation to the real hardware remains a challenge, known as the "sim-to-real" gap. Domain randomization is a simple yet effective technique to address dynamics discrepancies across source and target domains, but its success generally depends on heuristics and trial-and-error. In this work we investigate the impact of randomized parameter selection on policy transferability across different types of domain discrepancies. Contrary to common practice in which kinematic parameters are carefully measured while dynamic parameters are randomized, we found that virtually randomizing kinematic parameters (e.g., link lengths) during training in simulation generally outperforms dynamic randomization. Based on this finding, we introduce a new domain adaptation algorithm that utilizes simulated kinematic parameters variation. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
