TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception
Kailin Lyu, Long Xiao, Jianing Zeng, Junhao Dong, Xuexin Liu, Zhuojun Zou, Haoyue Yang, Lin Shu, Jie Hao

TL;DR
TouchFormer is a novel transformer-based framework that enhances multimodal material perception robustness by adaptively fusing cross-modal features and regularizing embeddings, significantly improving classification accuracy especially in real-world robotic applications.
Contribution
The paper introduces TouchFormer, a robust multimodal fusion framework with a Modality-Adaptive Gating mechanism and Cross-Instance Embedding Regularization, addressing challenges like noise and missing modalities.
Findings
Achieves 2.48% and 6.83% accuracy improvements on SSMC and USMC tasks.
Demonstrates effectiveness in real-world robotic perception scenarios.
Validates robustness against modality noise and missing data.
Abstract
Traditional vision-based material perception methods often experience substantial performance degradation under visually impaired conditions, thereby motivating the shift toward non-visual multimodal material perception. Despite this, existing approaches frequently perform naive fusion of multimodal inputs, overlooking key challenges such as modality-specific noise, missing modalities common in real-world scenarios, and the dynamically varying importance of each modality depending on the task. These limitations lead to suboptimal performance across several benchmark tasks. In this paper, we propose a robust multimodal fusion framework, TouchFormer. Specifically, we employ a Modality-Adaptive Gating (MAG) mechanism and intra- and inter-modality attention mechanisms to adaptively integrate cross-modal features, enhancing model robustness. Additionally, we introduce a Cross-Instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Robot Manipulation and Learning · Multimodal Machine Learning Applications
