Visuo-Acoustic Hand Pose and Contact Estimation
Yuemin Mao, Uksang Yoo, Yunchao Yao, Shahram Najam Syed, Luca Bondi, Jonathan Francis, Jean Oh, Jeffrey Ichnowski

TL;DR
VibeMesh is a wearable visuo-acoustic system that combines vision and active acoustic sensing with a graph neural network to accurately estimate hand pose and contact points, especially under occlusion.
Contribution
It introduces a novel, non-intrusive visuo-acoustic platform and a cross-modal graph network for dense hand pose and contact estimation, along with a new dataset.
Findings
Outperforms vision-only methods in accuracy
Robust in occluded and static-contact scenarios
Provides dense, high-resolution contact predictions
Abstract
Accurately estimating hand pose and hand-object contact events is essential for robot data-collection, immersive virtual environments, and biomechanical analysis, yet remains challenging due to visual occlusion, subtle contact cues, limitations in vision-only sensing, and the lack of accessible and flexible tactile sensing. We therefore introduce VibeMesh, a novel wearable system that fuses vision with active acoustic sensing for dense, per-vertex hand contact and pose estimation. VibeMesh integrates a bone-conduction speaker and sparse piezoelectric microphones, distributed on a human hand, emitting structured acoustic signals and capturing their propagation to infer changes induced by contact. To interpret these cross-modal signals, we propose a graph-based attention network that processes synchronized audio spectra and RGB-D-derived hand meshes to predict contact with high spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Robot Manipulation and Learning
