EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
Jiaxi Jiang, Paul Streli, Manuel Meier, Christian Holz

TL;DR
EgoPoser is a real-time egocentric full-body pose estimation method that works with sparse, intermittent observations, generalizes across users, and outperforms existing approaches in speed and accuracy.
Contribution
It introduces a novel global motion decomposition, a SlowFast module for longer motion capture, and robust modeling from limited headset view data, addressing key limitations of prior methods.
Findings
Outperforms state-of-the-art methods qualitatively and quantitatively
Maintains over 600fps inference speed
Generalizes across different body shapes and environments
Abstract
Full-body egocentric pose estimation from head and hand poses alone has become an active area of research to power articulate avatar representations on headset-based platforms. However, existing methods over-rely on the indoor motion-capture spaces in which datasets were recorded, while simultaneously assuming continuous joint motion capture and uniform body dimensions. We propose EgoPoser to overcome these limitations with four main contributions. 1) EgoPoser robustly models body pose from intermittent hand position and orientation tracking only when inside a headset's field of view. 2) We rethink input representations for headset-based ego-pose estimation and introduce a novel global motion decomposition method that predicts full-body pose independent of global positions. 3) We enhance pose estimation by capturing longer motion time series through an efficient SlowFast module design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Virtual Reality Applications and Impacts
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
