LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots
Peilin Wu, Weiji Xie, Jiahang Cao, Hang Lai, Weinan Zhang

TL;DR
LoopSR is a lifelong policy adaptation framework for legged robots that continuously refines reinforcement learning policies post-deployment by creating a digital twin of the real world, improving data efficiency and performance.
Contribution
The paper introduces LoopSR, a novel lifelong adaptation method using a transformer-based encoder and digital twin reconstruction for improved policy robustness.
Findings
Superior data efficiency in sim-to-sim and sim-to-real experiments.
Achieves high performance with limited real-world data.
Effectively adapts policies post-deployment using continual simulation-based training.
Abstract
Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to enhance policy robustness across diverse environments, they potentially compromise the policy's performance in any specific environment, leading to suboptimal real-world deployment due to the No Free Lunch theorem. To address this, we propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage. LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space and reconstruct a digital twin of the real world for further improvement. Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics. Simulation parameters for continual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control
MethodsContrastive Learning
