StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video
Huajian Zeng, Chaohua Yao, Yuantai Zhang, Jiaqi Yang, Rolandos Alexandros Potamias, Xingxing Zuo

TL;DR
StableHand introduces a quality-aware flow-matching approach that improves 4D world-space dual-hand motion estimation from egocentric video by leveraging per-frame observation quality signals.
Contribution
It proposes a novel quality-aware flow-matching framework conditioned on learned hand observation quality, enhancing robustness to occlusions and missing data in egocentric videos.
Findings
Achieves state-of-the-art performance on HOT3D and ARCTIC benchmarks.
Reduces W-MPJPE by 20-25% over the strongest baseline.
Shows largest improvements on heavily occluded sequences.
Abstract
Recovering world space 4D motion of two interacting hands from egocentric video is a fundamental capability for supervising robot policy learning, where wrist trajectories track the end-effector and finger articulations specify the grasp pose. Two major challenges arise in this setting: hands frequently leave the camera view for extended periods due to head motion, and persistent hand-object interactions cause severe occlusions of one or both hands. Existing methods uniformly condition on noisy hand motion observations without accounting for their per-frame reliability, leading to substantial performance degradation. Our key insight is that accurate world space hand motion estimation is tightly coupled with the quality of per-frame hand observations. To this end, we decompose the quality of hand motion observations extracted from an off-the-shelf hand pose estimator into four channels:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
