Cross-Modal Visuo-Tactile Object Perception
Anirvan Dutta, Simone Tasciotti, Claudia Cusseddu, Ang Li, Panayiota Poirazi, Julijana Gjorgjieva, Etienne Burdet, Patrick van der Smagt, Mohsen Kaboli

TL;DR
This paper introduces CMLF, a Bayesian framework for dynamic, cross-modal perception of physical object properties using vision and touch, inspired by human perception, improving robustness and enabling perceptual illusions.
Contribution
The paper presents a novel structured latent state-space model that facilitates bidirectional cross-modal inference and temporal evolution of beliefs about object properties in robotics.
Findings
CMLF outperforms baseline methods in estimating physical properties under uncertainty.
The model demonstrates human-like perceptual coupling phenomena, including cross-modal illusions.
Real-world experiments validate the robustness and efficiency of the proposed approach.
Abstract
Estimating physical properties is critical for safe and efficient autonomous robotic manipulation, particularly during contact-rich interactions. In such settings, vision and tactile sensing provide complementary information about object geometry, pose, inertia, stiffness, and contact dynamics, such as stick-slip behavior. However, these properties are only indirectly observable and cannot always be modeled precisely (e.g., deformation in non-rigid objects coupled with nonlinear contact friction), making the estimation problem inherently complex and requiring sustained exploitation of visuo-tactile sensory information during action. Existing visuo-tactile perception frameworks have primarily emphasized forceful sensor fusion or static cross-modal alignment, with limited consideration of how uncertainty and beliefs about object properties evolve over time. Inspired by human multi-sensory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
