ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations

Zhiyuan Wu; Yongqiang Zhao; Shan Luo

arXiv:2506.20757·cs.CV·June 27, 2025

ConViTac: Aligning Visual-Tactile Fusion with Contrastive Representations

Zhiyuan Wu, Yongqiang Zhao, Shan Luo

PDF

Open Access

TL;DR

ConViTac introduces a contrastive learning-based approach for better alignment of visual and tactile features in robotic perception, significantly improving performance in material classification and grasping tasks.

Contribution

It proposes a novel Contrastive Embedding Conditioning mechanism that enhances visual-tactile feature fusion through contrastive representations and cross-modal attention.

Findings

01

Improves accuracy by up to 12.0% in key tasks

02

Outperforms current state-of-the-art methods

03

Demonstrates effectiveness of contrastive embedding in modality alignment

Abstract

Vision and touch are two fundamental sensory modalities for robots, offering complementary information that enhances perception and manipulation tasks. Previous research has attempted to jointly learn visual-tactile representations to extract more meaningful information. However, these approaches often rely on direct combination, such as feature addition and concatenation, for modality fusion, which tend to result in poor feature integration. In this paper, we propose ConViTac, a visual-tactile representation learning network designed to enhance the alignment of features during fusion using contrastive representations. Our key contribution is a Contrastive Embedding Conditioning (CEC) mechanism that leverages a contrastive encoder pretrained through self-supervised contrastive learning to project visual and tactile inputs into unified latent embeddings. These embeddings are used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArchitecture and Computational Design · Tactile and Sensory Interactions · Interactive and Immersive Displays

MethodsContrastive Learning