Visual-auditory Extrinsic Contact Estimation
Xili Yi, Jayjun Lee, Nima Fazeli

TL;DR
This paper introduces a novel visual-auditory system for robotic extrinsic contact estimation, combining vision and active audio sensing to improve contact detection in cluttered, occluded environments, trained entirely in simulation with zero-shot real-world transfer.
Contribution
It presents a new multimodal perception pipeline that integrates visual and auditory data for contact estimation, including a real-to-sim audio hallucination technique for sim-to-real transfer.
Findings
Accurately estimates contact location and size in complex scenarios.
Enhances policy learning for contact-rich manipulation tasks.
Achieves zero-shot transfer from simulation to real-world environments.
Abstract
Robust manipulation often hinges on a robot's ability to perceive extrinsic contacts-contacts between a grasped object and its surrounding environment. However, these contacts are difficult to observe through vision alone due to occlusions, limited resolution, and ambiguous near-contact states. In this paper, we propose a visual-auditory method for extrinsic contact estimation that integrates global scene information from vision with local contact cues obtained through active audio sensing. Our approach equips a robotic gripper with contact microphones and conduction speakers, enabling the system to emit and receive acoustic signals through the grasped object to detect external contacts. We train our perception pipeline entirely in simulation and zero-shot transfer to the real world. To bridge the sim-to-real gap, we introduce a real-to-sim audio hallucination technique, injecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTactile and Sensory Interactions
