Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video

Hiroyuki Deguchi; Ryosuke Hori; Kotaro Amaya; Tsubasa Maruyama; Mitsunori Tada; Hideo Saito

arXiv:2605.20889·cs.CV·May 21, 2026

Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video

Hiroyuki Deguchi, Ryosuke Hori, Kotaro Amaya, Tsubasa Maruyama, Mitsunori Tada, Hideo Saito

PDF

TL;DR

MapMonoEgo enables accurate, map-grounded human pose estimation from monocular egocentric video by leveraging pre-scanned 3D environments, addressing scale ambiguity and absolute location challenges.

Contribution

The paper introduces MapMonoEgo, a novel framework for global pose estimation using monocular video and 3D maps, along with a new dataset for egocentric activity in scanned environments.

Findings

01

Outperforms existing methods in global pose accuracy

02

Achieves consistent long-term tracking without multi-sensor hardware

03

Demonstrates practical utility in activity monitoring

Abstract

Monocular egocentric human pose estimation is essential for ubiquitous activity monitoring. However, understanding the user's absolute location within the environment remains a challenge. Existing methods primarily focus on relative motion from an initial position, and tend not to account for the wearer's absolute location within an environment. Furthermore, inherent scale ambiguity in monocular vision leads to severe translational drift, limiting long-term tracking without specialized multi-sensor hardware. To address this, we propose MapMonoEgo, a novel framework achieving globally consistent human pose estimation solely from a monocular camera by leveraging a pre-scanned 3D point cloud. We also introduce AIST-Living dataset, a new dataset pairing egocentric video with ground-truth motion in a scanned environment. Experiments demonstrate that our approach significantly outperforms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.