FingerViP: Learning Real-World Dexterous Manipulation with Fingertip Visual Perception
Zhen Zhang, Weinan Wang, Hejia Sun, Qingpeng Ding, Xiangyu Chu, Guoxin Fang, K. W. Samuel Au

TL;DR
FingerViP introduces fingertip cameras on a dexterous hand to enhance visual perception, enabling robust learning of complex manipulation tasks directly from demonstrations.
Contribution
The paper presents a novel multi-view fingertip visual perception system integrated with a diffusion-based visuomotor policy for dexterous manipulation.
Findings
Achieved 80.8% success rate on various real-world tasks.
Enhanced visual perception improves robustness and adaptability.
Fingertip cameras provide comprehensive multi-view feedback.
Abstract
The current practice of dexterous manipulation generally relies on a single wrist-mounted view, which is often occluded and limits performance on tasks requiring multi-view perception. In this work, we present FingerViP, a learning system that utilizes a visuomotor policy with fingertip visual perception for dexterous manipulation. Specifically, we design a vision-enhanced fingertip module with an embedded miniature camera and install the modules on each finger of a multi-fingered hand. The fingertip cameras substantially improve visual perception by providing comprehensive, multi-view feedback of both the hand and its surrounding environment. Building on the integrated fingertip modules, we develop a diffusion-based whole-body visuomotor policy conditioned on a third-view camera and multi-view fingertip vision, which effectively learns complex manipulation skills directly from human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
