TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control

Zifeng Zhuang; Diyuan Shi; Runze Suo; Xiao He; Hongyin Zhang; Ting; Wang; Shangke Lyu; Donglin Wang

arXiv:2502.17322·cs.RO·February 25, 2025

TDMPBC: Self-Imitative Reinforcement Learning for Humanoid Robot Control

Zifeng Zhuang, Diyuan Shi, Runze Suo, Xiao He, Hongyin Zhang, Ting, Wang, Shangke Lyu, Donglin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SIRL, a reinforcement learning framework for humanoid robots that imitates task-relevant trajectories to improve learning efficiency and success rates in high-dimensional, narrow feasible regions.

Contribution

The paper proposes SIRL, a novel RL approach that dynamically adjusts imitation based on trajectory relevance, significantly enhancing performance in complex humanoid control tasks.

Findings

01

Achieves 120% performance improvement on HumanoidBench

02

Effectively focuses exploration on task-relevant regions

03

Results show meaningful behavior improvements and task success

Abstract

Complex high-dimensional spaces with high Degree-of-Freedom and complicated action spaces, such as humanoid robots equipped with dexterous hands, pose significant challenges for reinforcement learning (RL) algorithms, which need to wisely balance exploration and exploitation under limited sample budgets. In general, feasible regions for accomplishing tasks within complex high-dimensional spaces are exceedingly narrow. For instance, in the context of humanoid robot motion control, the vast majority of space corresponds to falling, while only a minuscule fraction corresponds to standing upright, which is conducive to the completion of downstream tasks. Once the robot explores into a potentially task-relevant region, it should place greater emphasis on the data within that region. Building on this insight, we propose the $S$ elf- $I$ mitative $R$ einforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

carlosferrazza/humanoid-bench
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics