Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie
Zhaoming Xie, Patrick Clary, Jeremy Dao, Pedro Morais, Jonathan Hurst,, Michiel van de Panne

TL;DR
This paper introduces an iterative reinforcement learning method that enables flexible reward function redesigns and robust policy transfer for legged robot Cassie, achieving stable, variable-speed walking without dynamics randomization.
Contribution
It presents a novel iterative RL framework using DASS tuples for policy design, transfer, and distillation, improving legged robot locomotion development.
Findings
Successful transfer of policies from simulation to physical robot
Stable walking with multiple gait styles at various speeds
Effective policy distillation with small datasets
Abstract
Deep reinforcement learning (DRL) is a promising approach for developing legged locomotion skills. However, the iterative design process that is inevitable in practice is poorly supported by the default methodology. It is difficult to predict the outcomes of changes made to the reward functions, policy architectures, and the set of tasks being trained on. In this paper, we propose a practical method that allows the reward function to be fully redefined on each successive design iteration while limiting the deviation from the previous iteration. We characterize policies via sets of Deterministic Action Stochastic State (DASS) tuples, which represent the deterministic policy state-action pairs as sampled from the states visited by the trained stochastic policy. New policies are trained using a policy gradient algorithm which then mixes RL-based policy gradients with gradient updates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control · Reinforcement Learning in Robotics · Prosthetics and Rehabilitation Robotics
