HandCept: A Visual-Inertial Fusion Framework for Accurate Proprioception in Dexterous Hands
Junda Huang, Jianshu Zhou, Honghao Guo, Yunhui Liu

TL;DR
HandCept introduces a real-time visual-inertial fusion framework using a wrist-mounted camera and IMUs, achieving accurate, drift-free joint angle estimation in dexterous robotic hands, enhancing manipulation capabilities.
Contribution
The paper presents a novel zero-shot learning visual-inertial fusion framework with a latency-free EKF for accurate proprioception in dexterous hands, including a high-fidelity rendering pipeline for sim-to-real transfer.
Findings
Achieves joint angle errors between 2° and 4° without drift.
Outperforms visual-only and inertial-only methods.
Provides a stable, calibrated IMU system with a common base frame.
Abstract
As robotics progresses toward general manipulation, dexterous hands are becoming increasingly critical. However, proprioception in dexterous hands remains a bottleneck due to limitations in volume and generality. In this work, we present HandCept, a novel visual-inertial proprioception framework designed to overcome the challenges of traditional joint angle estimation methods. HandCept addresses the difficulty of achieving accurate and robust joint angle estimation in dynamic environments where both visual and inertial measurements are prone to noise and drift. It leverages a zero-shot learning approach using a wrist-mounted RGB-D camera and 9-axis IMUs, fused in real time via a latency-free Extended Kalman Filter (EKF). Our results show that HandCept achieves joint angle estimation errors between and without observable drift, outperforming visual-only and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
