PHUMA: Physically-Grounded Humanoid Locomotion Dataset
Kyungmin Lee, Sibeen Kim, Minho Park, Hyunseung Kim, Dongyoon Hwang, Hojoon Lee, Jaegul Choo

TL;DR
PHUMA is a new physically-grounded humanoid locomotion dataset that improves motion imitation by addressing artifacts in large-scale video-based data, enabling more reliable and diverse humanoid behaviors.
Contribution
The paper introduces PHUMA, a physically-grounded humanoid dataset that enhances motion quality through careful data curation and physics-constrained retargeting, surpassing existing datasets.
Findings
PHUMA-trained policies outperform existing datasets in motion imitation tasks.
PHUMA enables stable imitation of diverse human motions.
The dataset improves path following and motion diversity in humanoid locomotion.
Abstract
Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We…
Peer Reviews
Decision·Submitted to ICLR 2026
- Data curation and processing pipeline is well-justified and thoroughly explained. - PhySINK has good quantitative comparison to SINK and Mink's IK as appropriate baselines. - Method identifies real problems with existing video-to-humanoid motion pipelines that lead to poor downstream performance during RL training and deployment. - The dataset is relatively large, comparing favorably against popular datasets like AMASS - Data pipeline validated on multiple humanoid robot form factors.
- 1.2x success rate improvement over AMASS isn't very dramatic. - "Physically grounded" in this paper means the motions have kinematic plausibility as defined by heuristics and loosely verified by downstream sim RL training. I think this is somewhat misleading wording, since I would interpret "physically grounded" to mean that it is simulated dynamically in the data processing loop. - A sim-to-real deployment would be a stronger validation of the data pipeline.
1. The paper focuses on physically grounded humanoid motion, addressing a gap in large-scale imitation datasets where stability and contact consistency are often ignored. 2. The proposed PhySINK pipeline is technically solid and produces cleaner motion data with fewer artifacts than Humanoid-X. 3. The experiments are comprehensive within simulation, demonstrating clear quantitative improvements on multiple humanoid platforms and providing a useful dataset that can benefit future physics-based
1. While the paper presents a clean and useful dataset, its novelty appears somewhat limited relative to recent works that also improve over Humanoid-X through physics-aware retargeting. Methods such as ASAP (RSS 2025), KunfuBot (NeurIPS 2025), and GMR have already introduced substantial innovations in motion retargeting and data cleaning—ASAP integrates RL-based physical simulation during retargeting, KunfuBot applies extensive filtering for realistic contacts, and GMR focuses specifically on r
- The curated PHUMA is superior to existing efforts in scale and diversity. - The proposed pipeline seems reasonable and practical. - The proposed pipeline and dataset could serve as valuable resources for humanoid learning.
- A noticeable characteristic of PHUMA is its heterogeneous data sources. Extending the existing data quality analysis and performance analysis to data from different sources would be preferred. A separate evaluation of mocap-sourced and video-sourced PHUMA would be helpful. - Comparisons between the proposed physink and related works, like GMR, are missing. - Though the comparison of heuristic metrics in Table 2 is straightforward, a missed chance would be investigating the relationship betw
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
