Symmetry-Aware Fusion of Vision and Tactile Sensing via Bilateral Force Priors for Robotic Manipulation
Wonju Lee, Matteo Grimaldi, Tao Yu

TL;DR
This paper introduces a symmetry-aware visuo-tactile fusion method using a Cross-Modal Transformer with physics-informed regularization, significantly improving robotic insertion success rates by effectively combining vision and tactile feedback.
Contribution
It proposes a novel CMT architecture with bilateral force regularization for stable visuo-tactile fusion in robotic manipulation tasks.
Findings
Achieves 96.59% insertion success rate on TacSL benchmark.
Outperforms naive and gated fusion baselines.
Approaches privileged sensor configuration performance.
Abstract
Insertion tasks in robotic manipulation demand precise, contact-rich interactions that vision alone cannot resolve. While tactile feedback is intuitively valuable, existing studies have shown that na\"ive visuo-tactile fusion often fails to deliver consistent improvements. In this work, we propose a Cross-Modal Transformer (CMT) for visuo-tactile fusion that integrates wrist-camera observations with tactile signals through structured self- and cross-attention. To stabilize tactile embeddings, we further introduce a physics-informed regularization that encourages bilateral force balance, reflecting principles of human motor control. Experiments on the TacSL benchmark show that CMT with symmetry regularization achieves a 96.59% insertion success rate, surpassing na\"ive and gated fusion baselines and closely matching the privileged "wrist + contact force" configuration (96.09%). These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Sensor and Energy Harvesting Materials · Tactile and Sensory Interactions · Robot Manipulation and Learning
