How does the structure embedded in learning policy affect learning quadruped locomotion?
Kuangen Zhang, Jongwoo Lee, Zhimin Hou, Clarence W. de Silva,, Chenglong Fu, Neville Hogan

TL;DR
This paper investigates how the level of structure in learned policies influences quadruped locomotion performance, showing that more structured policies train faster and are more robust than direct policies.
Contribution
It provides a quantitative analysis of the impact of policy structure on learning efficiency and robustness in quadruped locomotion tasks.
Findings
Structured policies require fewer training steps than direct policies.
Highly structured policies are more robust to disturbances.
Embedding structure significantly improves learning efficiency.
Abstract
Reinforcement learning (RL) is a popular data-driven method that has demonstrated great success in robotics. Previous works usually focus on learning an end-to-end (direct) policy to directly output joint torques. While the direct policy seems convenient, the resultant performance may not meet our expectations. To improve its performance, more sophisticated reward functions or more structured policies can be utilized. This paper focuses on the latter because the structured policy is more intuitive and can inherit insights from previous model-based controllers. It is unsurprising that the structure, such as a better choice of the action space and constraints of motion trajectory, may benefit the training process and the final performance of the policy at the cost of generality, but the quantitative effect is still unclear. To analyze the effect of the structure quantitatively, this paper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Locomotion and Control · Robot Manipulation and Learning · Reinforcement Learning in Robotics
