Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

Seunggeun Chi; Pin-Hao Huang; Enna Sachdeva; Hengbo Ma; Karthik; Ramani; and Kwonjoon Lee

arXiv:2411.03561·cs.CV·November 7, 2024

Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data

Seunggeun Chi, Pin-Hao Huang, Enna Sachdeva, Hengbo Ma, Karthik, Ramani, and Kwonjoon Lee

PDF

Open Access 1 Video

TL;DR

This paper presents a novel two-stage method for estimating full-body pose from sparse egocentric video data, combining temporal imputation and spatial generation to improve accuracy and robustness.

Contribution

It introduces a two-stage approach using masked autoencoders and diffusion models to estimate full-body pose from sparse head and hand observations in egocentric videos.

Findings

01

Effective in estimating full-body pose from sparse data

02

Improves over naive diffusion model applications

03

Validated on multiple datasets with strong results

Abstract

We study the problem of estimating the body movements of a camera wearer from egocentric videos. Current methods for ego-body pose estimation rely on temporally dense sensor data, such as IMU measurements from spatially sparse body parts like the head and hands. However, we propose that even temporally sparse observations, such as hand poses captured intermittently from egocentric videos during natural or periodic hand movements, can effectively constrain overall body motion. Naively applying diffusion models to generate full-body pose from head pose and sparse hand pose leads to suboptimal results. To overcome this, we develop a two-stage approach that decomposes the problem into temporal completion and spatial completion. First, our method employs masked autoencoders to impute hand trajectories by leveraging the spatiotemporal correlations between the head pose sequence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Virtual Reality Applications and Impacts

MethodsDiffusion