AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking

Ziyi Kou; Ankit Kumar; Mia Huang; Taylor Niehues; Vatsal Mehta; Ergys Ristani; Li Guan

arXiv:2605.21714·cs.CV·May 22, 2026

AVI-HT: Adaptive Vision-IMU Fusion for 3D Hand Tracking

Ziyi Kou, Ankit Kumar, Mia Huang, Taylor Niehues, Vatsal Mehta, Ergys Ristani, Li Guan

PDF

TL;DR

AVI-HT introduces an adaptive fusion method combining visual data and IMU signals for accurate 3D hand tracking, especially under occlusion, using a novel attention mechanism and extensive real-world data.

Contribution

The paper proposes a new adaptive vision-IMU fusion approach with a cross-sensor attention mechanism and a large annotated dataset for improved 3D hand tracking.

Findings

01

AVI-HT reduces mean keypoint error by 16.1%.

02

Wrist-aligned variant reduces error by 24.2%.

03

Ablation studies show finger-specific IMU contributions.

Abstract

We present AVI-HT, an adaptive visual-IMU fusion approach for tracking 3D hand poses by jointly modeling the egocentric image with on-glove 6-DoF IMU signals. AVI-HT achieves significantly improved accuracy and availability, particularly in hand-object interaction (HOI) scenarios involving heavy visual occlusion. Two complementary ingredients underpin its success: (1) synchronized multi-modal training data pairing on-body vision-IMU sensor streams with ground-truth 3D hand poses from a motion-capture system, and (2) a cross-sensor deep attention mechanism that adaptively modulates the trust assigned to the vision and individual IMU sensors. To evaluate AVI-HT in real-world settings, we conduct extensive experiments on our DexGloveHOI dataset that consists of 100K+ pairwise vision-IMU samples with synchronized 3D annotated poses, in which users manipulate a variety of objects during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.