UNIC: Learning Unified Multimodal Extrinsic Contact Estimation
Zhengtong Xu, Yuki Shirai

TL;DR
UNIC introduces a versatile, data-driven multimodal framework for extrinsic contact estimation in manipulation tasks, eliminating the need for prior knowledge or calibration, and demonstrating robustness across various scenarios.
Contribution
It proposes a unified multimodal approach that encodes visual, proprioceptive, and tactile data without prior assumptions, improving generalization and robustness in contact estimation.
Findings
Achieves 9.6 mm average Chamfer distance error on unseen contact points
Performs reliably on unseen objects and under missing modalities
Remains robust with dynamic camera viewpoints
Abstract
Contact-rich manipulation requires reliable estimation of extrinsic contacts-the interactions between a grasped object and its environment which provide essential contextual information for planning, control, and policy learning. However, existing approaches often rely on restrictive assumptions, such as predefined contact types, fixed grasp configurations, or camera calibration, that hinder generalization to novel objects and deployment in unstructured environments. In this paper, we present UNIC, a unified multimodal framework for extrinsic contact estimation that operates without any prior knowledge or camera calibration. UNIC directly encodes visual observations in the camera frame and integrates them with proprioceptive and tactile modalities in a fully data-driven manner. It introduces a unified contact representation based on scene affordance maps that captures diverse contact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Hand Gesture Recognition Systems · Teleoperation and Haptic Systems
