Data Efficient Reinforcement Learning for Legged Robots
Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Tingnan Zhang, Jie Tan,, Vikas Sindhwani

TL;DR
This paper introduces a data-efficient, model-based reinforcement learning framework for quadruped robot locomotion, achieving robust walking with minimal training data and enabling versatile task adaptation.
Contribution
The authors develop a novel long-horizon loss function and planning approach that significantly improves sample efficiency and safety in robot learning, enabling real-time control with limited data.
Findings
Achieved stable quadruped walking with only 4.5 minutes of data
Outperformed existing model-free methods in sample efficiency by over an order of magnitude
Demonstrated versatile task adaptation using the learned dynamics
Abstract
We present a model-based framework for robot locomotion that achieves walking based on only 4.5 minutes (45,000 control steps) of data collected on a quadruped robot. To accurately model the robot's dynamics over a long horizon, we introduce a loss function that tracks the model's prediction over multiple timesteps. We adapt model predictive control to account for planning latency, which allows the learned model to be used for real time control. Additionally, to ensure safe exploration during model learning, we embed prior knowledge of leg trajectories into the action space. The resulting system achieves fast and robust locomotion. Unlike model-free methods, which optimize for a particular task, our planner can use the same learned dynamics for various tasks, simply by changing the reward function. To the best of our knowledge, our approach is more than an order of magnitude more sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control · Prosthetics and Rehabilitation Robotics · Reinforcement Learning in Robotics
