PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Kyungmin Lee; Sibeen Kim; Minho Park; Hyunseung Kim; Dongyoon Hwang; Hojoon Lee; Jaegul Choo

arXiv:2510.26236·cs.RO·October 31, 2025

PHUMA: Physically-Grounded Humanoid Locomotion Dataset

Kyungmin Lee, Sibeen Kim, Minho Park, Hyunseung Kim, Dongyoon Hwang, Hojoon Lee, Jaegul Choo

PDF

1 Datasets 3 Reviews

TL;DR

PHUMA is a new physically-grounded humanoid locomotion dataset that improves motion imitation by addressing artifacts in large-scale video-based data, enabling more reliable and diverse humanoid behaviors.

Contribution

The paper introduces PHUMA, a physically-grounded humanoid dataset that enhances motion quality through careful data curation and physics-constrained retargeting, surpassing existing datasets.

Findings

01

PHUMA-trained policies outperform existing datasets in motion imitation tasks.

02

PHUMA enables stable imitation of diverse human motions.

03

The dataset improves path following and motion diversity in humanoid locomotion.

Abstract

Motion imitation is a promising approach for humanoid locomotion, enabling agents to acquire humanlike behaviors. Existing methods typically rely on high-quality motion capture datasets such as AMASS, but these are scarce and expensive, limiting scalability and diversity. Recent studies attempt to scale data collection by converting large-scale internet videos, exemplified by Humanoid-X. However, they often introduce physical artifacts such as floating, penetration, and foot skating, which hinder stable imitation. In response, we introduce PHUMA, a Physically-grounded HUMAnoid locomotion dataset that leverages human video at scale, while addressing physical artifacts through careful data curation and physics-constrained retargeting. PHUMA enforces joint limits, ensures ground contact, and eliminates foot skating, producing motions that are both large-scale and physically reliable. We…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 8Confidence 3

Strengths

- Data curation and processing pipeline is well-justified and thoroughly explained. - PhySINK has good quantitative comparison to SINK and Mink's IK as appropriate baselines. - Method identifies real problems with existing video-to-humanoid motion pipelines that lead to poor downstream performance during RL training and deployment. - The dataset is relatively large, comparing favorably against popular datasets like AMASS - Data pipeline validated on multiple humanoid robot form factors.

Weaknesses

- 1.2x success rate improvement over AMASS isn't very dramatic. - "Physically grounded" in this paper means the motions have kinematic plausibility as defined by heuristics and loosely verified by downstream sim RL training. I think this is somewhat misleading wording, since I would interpret "physically grounded" to mean that it is simulated dynamically in the data processing loop. - A sim-to-real deployment would be a stronger validation of the data pipeline.

Reviewer 02Rating 2Confidence 4

Strengths

1. The paper focuses on physically grounded humanoid motion, addressing a gap in large-scale imitation datasets where stability and contact consistency are often ignored. 2. The proposed PhySINK pipeline is technically solid and produces cleaner motion data with fewer artifacts than Humanoid-X. 3. The experiments are comprehensive within simulation, demonstrating clear quantitative improvements on multiple humanoid platforms and providing a useful dataset that can benefit future physics-based

Weaknesses

1. While the paper presents a clean and useful dataset, its novelty appears somewhat limited relative to recent works that also improve over Humanoid-X through physics-aware retargeting. Methods such as ASAP (RSS 2025), KunfuBot (NeurIPS 2025), and GMR have already introduced substantial innovations in motion retargeting and data cleaning—ASAP integrates RL-based physical simulation during retargeting, KunfuBot applies extensive filtering for realistic contacts, and GMR focuses specifically on r

Reviewer 03Rating 4Confidence 4

Strengths

- The curated PHUMA is superior to existing efforts in scale and diversity. - The proposed pipeline seems reasonable and practical. - The proposed pipeline and dataset could serve as valuable resources for humanoid learning.

Weaknesses

- A noticeable characteristic of PHUMA is its heterogeneous data sources. Extending the existing data quality analysis and performance analysis to data from different sources would be preferred. A separate evaluation of mocap-sourced and video-sourced PHUMA would be helpful. - Comparisons between the proposed physink and related works, like GMR, are missing. - Though the comparison of heuristic metrics in Table 2 is straightforward, a missed chance would be investigating the relationship betw

Code & Models

Datasets

DAVIAN-Robotics/PHUMA
dataset· 96 dl
96 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.