EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and   Intermittent Observations Everywhere

Jiaxi Jiang; Paul Streli; Manuel Meier; Christian Holz

arXiv:2308.06493·cs.CV·September 9, 2024·2 cites

EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere

Jiaxi Jiang, Paul Streli, Manuel Meier, Christian Holz

PDF

Open Access

TL;DR

EgoPoser is a real-time egocentric full-body pose estimation method that works with sparse, intermittent observations, generalizes across users, and outperforms existing approaches in speed and accuracy.

Contribution

It introduces a novel global motion decomposition, a SlowFast module for longer motion capture, and robust modeling from limited headset view data, addressing key limitations of prior methods.

Findings

01

Outperforms state-of-the-art methods qualitatively and quantitatively

02

Maintains over 600fps inference speed

03

Generalizes across different body shapes and environments

Abstract

Full-body egocentric pose estimation from head and hand poses alone has become an active area of research to power articulate avatar representations on headset-based platforms. However, existing methods over-rely on the indoor motion-capture spaces in which datasets were recorded, while simultaneously assuming continuous joint motion capture and uniform body dimensions. We propose EgoPoser to overcome these limitations with four main contributions. 1) EgoPoser robustly models body pose from intermittent hand position and orientation tracking only when inside a headset's field of view. 2) We rethink input representations for headset-based ego-pose estimation and introduce a novel global motion decomposition method that predicts full-body pose independent of global positions. 3) We enhance pose estimation by capturing longer motion time series through an efficient SlowFast module design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Virtual Reality Applications and Impacts

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings