Residual Policy Learning for Perceptive Quadruped Control Using   Differentiable Simulation

Jing Yuan Luo; Yunlong Song; Victor Klemm; Fan Shi; Davide Scaramuzza,; Marco Hutter

arXiv:2410.03076·cs.RO·October 7, 2024

Residual Policy Learning for Perceptive Quadruped Control Using Differentiable Simulation

Jing Yuan Luo, Yunlong Song, Victor Klemm, Fan Shi, Davide Scaramuzza,, Marco Hutter

PDF

Open Access

TL;DR

This paper introduces Residual Policy Learning combined with First-order Policy Gradient algorithms to enhance quadruped robot control, achieving faster training and improved rewards in contact-rich and perceptive navigation tasks.

Contribution

It proposes a residual policy learning approach to guide FoPG algorithms, significantly improving asymptotic rewards and enabling rapid end-to-end training for quadruped locomotion and navigation.

Findings

01

FoPG RPL improves asymptotic rewards in quadruped locomotion.

02

FoPG algorithms can be effectively applied to pixel-based navigation tasks.

03

End-to-end training of quadruped control is achieved within minutes.

Abstract

First-order Policy Gradient (FoPG) algorithms such as Backpropagation through Time and Analytical Policy Gradients leverage local simulation physics to accelerate policy search, significantly improving sample efficiency in robot control compared to standard model-free reinforcement learning. However, FoPG algorithms can exhibit poor learning dynamics in contact-rich tasks like locomotion. Previous approaches address this issue by alleviating contact dynamics via algorithmic or simulation innovations. In contrast, we propose guiding the policy search by learning a residual over a simple baseline policy. For quadruped locomotion, we find that the role of residual policy learning in FoPG-based training (FoPG RPL) is primarily to improve asymptotic rewards, compared to improving sample efficiency for model-free RL. Additionally, we provide insights on applying FoPG's to pixel-based local…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIterative Learning Control Systems · Advanced Control Systems Optimization