Cross-Modal Visuo-Tactile Object Perception

Anirvan Dutta; Simone Tasciotti; Claudia Cusseddu; Ang Li; Panayiota Poirazi; Julijana Gjorgjieva; Etienne Burdet; Patrick van der Smagt; Mohsen Kaboli

arXiv:2604.02108·cs.RO·April 3, 2026

Cross-Modal Visuo-Tactile Object Perception

Anirvan Dutta, Simone Tasciotti, Claudia Cusseddu, Ang Li, Panayiota Poirazi, Julijana Gjorgjieva, Etienne Burdet, Patrick van der Smagt, Mohsen Kaboli

PDF

TL;DR

This paper introduces CMLF, a Bayesian framework for dynamic, cross-modal perception of physical object properties using vision and touch, inspired by human perception, improving robustness and enabling perceptual illusions.

Contribution

The paper presents a novel structured latent state-space model that facilitates bidirectional cross-modal inference and temporal evolution of beliefs about object properties in robotics.

Findings

01

CMLF outperforms baseline methods in estimating physical properties under uncertainty.

02

The model demonstrates human-like perceptual coupling phenomena, including cross-modal illusions.

03

Real-world experiments validate the robustness and efficiency of the proposed approach.

Abstract

Estimating physical properties is critical for safe and efficient autonomous robotic manipulation, particularly during contact-rich interactions. In such settings, vision and tactile sensing provide complementary information about object geometry, pose, inertia, stiffness, and contact dynamics, such as stick-slip behavior. However, these properties are only indirectly observable and cannot always be modeled precisely (e.g., deformation in non-rigid objects coupled with nonlinear contact friction), making the estimation problem inherently complex and requiring sustained exploitation of visuo-tactile sensory information during action. Existing visuo-tactile perception frameworks have primarily emphasized forceful sensor fusion or static cross-modal alignment, with limited consideration of how uncertainty and beliefs about object properties evolve over time. Inspired by human multi-sensory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.