ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations
Zhiyuan Wu, Yongqiang Zhao, Shan Luo

TL;DR
ConViTac introduces a contrastive learning-based approach for better alignment of visual and tactile features in robotic perception, significantly improving performance in material classification and grasping tasks.
Contribution
It proposes a novel Contrastive Embedding Conditioning mechanism that enhances visual-tactile feature fusion through contrastive representations and cross-modal attention.
Findings
Improves accuracy by up to 12.0% in key tasks
Outperforms current state-of-the-art methods
Demonstrates effectiveness of contrastive embedding in modality alignment
Abstract
Vision and touch are two fundamental sensory modalities for robots, offering complementary information that enhances perception and manipulation tasks. Previous research has attempted to jointly learn visual-tactile representations to extract more meaningful information. However, these approaches often rely on direct combination, such as feature addition and concatenation, for modality fusion, which tend to result in poor feature integration. In this paper, we propose ConViTac, a visual-tactile representation learning network designed to enhance the alignment of features during fusion using contrastive representations. Our key contribution is a Contrastive Embedding Conditioning (CEC) mechanism that leverages a contrastive encoder pretrained through self-supervised contrastive learning to project visual and tactile inputs into unified latent embeddings. These embeddings are used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArchitecture and Computational Design · Tactile and Sensory Interactions · Interactive and Immersive Displays
MethodsContrastive Learning
