EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR

Zhenyu Li; Sai Kumar Dwivedi; Filip Maric; Carlos Chacon; Nadine Bertsch; Filippo Arcadu; Tomas Hodan; Michael Ramamonjisoa; Peter Wonka; Amy Zhao; Robin Kips; Cem Keskin; Anastasia Tkach; Chenhongyi Yang

arXiv:2603.04090·cs.CV·March 5, 2026

EgoPoseFormer v2: Accurate Egocentric Human Motion Estimation for AR/VR

Zhenyu Li, Sai Kumar Dwivedi, Filip Maric, Carlos Chacon, Nadine Bertsch, Filippo Arcadu, Tomas Hodan, Michael Ramamonjisoa, Peter Wonka, Amy Zhao, Robin Kips, Cem Keskin, Anastasia Tkach, Chenhongyi Yang

PDF

Open Access

TL;DR

EgoPoseFormer v2 introduces a transformer-based egocentric human motion estimation method with an auto-labeling system, achieving high accuracy and temporal consistency for AR/VR applications using large unlabeled datasets.

Contribution

The paper presents a novel transformer-based model with auto-labeling for scalable, accurate egocentric human motion estimation in AR/VR.

Findings

01

Outperforms state-of-the-art by 12.2% and 19.4% in accuracy

02

Reduces temporal jitter by over 50%

03

Auto-labeling improves wrist MPJPE by 13.1%

Abstract

Egocentric human motion estimation is essential for AR/VR experiences, yet remains challenging due to limited body coverage from the egocentric viewpoint, frequent occlusions, and scarce labeled data. We present EgoPoseFormer v2, a method that addresses these challenges through two key contributions: (1) a transformer-based model for temporally consistent and spatially grounded body pose estimation, and (2) an auto-labeling system that enables the use of large unlabeled datasets for training. Our model is fully differentiable, introduces identity-conditioned queries, multi-view spatial refinement, causal temporal attention, and supports both keypoints and parametric body representations under a constant compute budget. The auto-labeling system scales learning to tens of millions of unlabeled frames via uncertainty-aware semi-supervised training. The system follows a teacher-student…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis