Towards Egocentric 3D Hand Pose Estimation in Unseen Domains

Wiktor Mucha; Michael Wray; Martin Kampel

arXiv:2601.06537·cs.CV·January 13, 2026

Towards Egocentric 3D Hand Pose Estimation in Unseen Domains

Wiktor Mucha, Michael Wray, Martin Kampel

PDF

Open Access

TL;DR

This paper introduces V-HPOT, a camera-agnostic, self-supervised approach for egocentric 3D hand pose estimation that significantly improves cross-domain performance without extensive training data.

Contribution

V-HPOT's key innovation is estimating normalized keypoint depth and applying self-supervised test-time optimization, enabling robust cross-domain hand pose estimation.

Findings

01

Achieves 71% reduction in mean pose error on H2O dataset.

02

Achieves 41% reduction in mean pose error on AssemblyHands dataset.

03

Outperforms all single-stage methods and rivals two-stage approaches with less data.

Abstract

We present V-HPOT, a novel approach for improving the cross-domain performance of 3D hand pose estimation from egocentric images across diverse, unseen domains. State-of-the-art methods demonstrate strong performance when trained and tested within the same domain. However, they struggle to generalise to new environments due to limited training data and depth perception -- overfitting to specific camera intrinsics. Our method addresses this by estimating keypoint z-coordinates in a virtual camera space, normalised by focal length and image size, enabling camera-agnostic depth prediction. We further leverage this invariance to camera intrinsics to propose a self-supervised test-time optimisation strategy that refines the model's depth perception during inference. This is achieved by applying a 3D consistency loss between predicted and in-space scale-transformed hand poses, allowing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Hand Gesture Recognition Systems