EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera
Christen Millerdurai, Shaoxiang Wang, Yaxu Xie, Vladislav Golyanik, Didier Stricker, Alain Pagani

TL;DR
EgoForce is a unified monocular framework that accurately reconstructs absolute 3D hand pose and position from a single egocentric camera across various optical configurations, advancing AR/VR interaction.
Contribution
It introduces a novel approach combining a differentiable forearm model, a unified transformer, and a ray space solver for robust, device-agnostic 3D hand reconstruction from monocular egocentric images.
Findings
Achieves state-of-the-art 3D accuracy on egocentric benchmarks.
Reduces camera-space MPJPE by up to 28% on HOT3D dataset.
Maintains consistent performance across diverse head-mounted camera models.
Abstract
Reconstructing the absolute 3D pose and shape of the hands from the user's viewpoint using a single head-mounted camera is crucial for practical egocentric interaction in AR/VR, telepresence, and hand-centric manipulation tasks, where sensing must remain compact and unobtrusive. While monocular RGB methods have made progress, they remain constrained by depth-scale ambiguity and struggle to generalize across the diverse optical configurations of head-mounted devices. As a result, models typically require extensive training on device-specific datasets, which are costly and laborious to acquire. This paper addresses these challenges by introducing EgoForce, a monocular 3D hand reconstruction framework that recovers robust, absolute 3D hand pose and its position from the user's (camera-space) viewpoint. EgoForce operates across fisheye, perspective, and distorted wide-FOV camera models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
