General In-Hand Object Rotation with Vision and Touch
Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto, Calandra, Jitendra Malik

TL;DR
This paper presents RotateIt, a system that uses multimodal vision and touch inputs, trained in simulation and distilled for real-world deployment, to enable precise in-hand object rotation along multiple axes.
Contribution
The paper introduces a novel visuotactile transformer-based system that fuses visual and tactile data for in-hand object rotation, trained in simulation and adapted for real-world noisy inputs.
Findings
Significant performance improvements over prior methods
Effective fusion of visual and tactile sensing
Successful deployment on realistic noisy sensory inputs
Abstract
We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions · Interactive and Immersive Displays · Hand Gesture Recognition Systems
