TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion

Khang Nguyen; Khai Nguyen; An T. Le; Jan Peters; Manfred Huber; Ngo Anh Vien; and Minh Nhat Vu

arXiv:2505.13549·cs.RO·May 21, 2025

TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion

Khang Nguyen, Khai Nguyen, An T. Le, Jan Peters, Manfred Huber, Ngo Anh Vien, and Minh Nhat Vu

PDF

Open Access

TL;DR

TD-GRPC enhances humanoid robot locomotion learning by integrating group-relative policy constraints with trust-region methods, improving stability, robustness, and efficiency in complex control tasks.

Contribution

Introduces TD-GRPC, a novel extension of TD-MPC that combines group-relative policy optimization with explicit policy constraints for improved humanoid locomotion.

Findings

01

Demonstrates improved stability and robustness in humanoid control tasks.

02

Achieves higher sampling efficiency during training.

03

Successfully handles complex, dynamic movements on a 26-DoF robot.

Abstract

Robot learning in high-dimensional control settings, such as humanoid locomotion, presents persistent challenges for reinforcement learning (RL) algorithms due to unstable dynamics, complex contact interactions, and sensitivity to distributional shifts during training. Model-based methods, \textit{e.g.}, Temporal-Difference Model Predictive Control (TD-MPC), have demonstrated promising results by combining short-horizon planning with value-based learning, enabling efficient solutions for basic locomotion tasks. However, these approaches remain ineffective in addressing policy mismatch and instability introduced by off-policy updates. Thus, in this work, we introduce Temporal-Difference Group Relative Policy Constraint (TD-GRPC), an extension of the TD-MPC framework that unifies Group Relative Policy Optimization (GRPO) with explicit Policy Constraints (PC). TD-GRPC applies a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation