Benchmarking Humanoid Imitation Learning with Motion Difficulty
Zhaorui Meng, Lu Yin, Xinrui Chen, Anjun Chen, Shihui Guo, Yipeng Qin

TL;DR
This paper introduces the Motion Difficulty Score (MDS), a new metric based on torque variation to quantify the inherent difficulty of humanoid motions independently of policy performance, improving evaluation of imitation learning.
Contribution
The work proposes MDS, a physics-based difficulty metric, and demonstrates its effectiveness in evaluating and understanding motion imitation policies and datasets.
Findings
MDS correlates with policy performance across different motions.
MDS-based metrics reveal insights into imitation learning challenges.
MD-AMASS dataset is partitioned by difficulty using MDS.
Abstract
Physics-based motion imitation is central to humanoid control, yet current evaluation metrics (e.g., joint position error) only measure how well a policy imitates but not how difficult the motion itself is. This conflates policy performance with motion difficulty, obscuring whether failures stem from poor learning or inherently challenging motions. In this work, we address this gap with Motion Difficulty Score (MDS), a novel metric that defines and quantifies imitation difficulty independent of policy performance. Grounded in rigid-body dynamics, MDS interprets difficulty as the torque variation induced by small pose perturbations: larger torque-to-pose variation yields flatter reward landscapes and thus higher learning difficulty. MDS captures this through three properties of the perturbation-induced torque space: volume, variance, and temporal variability. We also use it to construct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Motion and Animation · Human Pose and Action Recognition
