TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception

Kailin Lyu; Long Xiao; Jianing Zeng; Junhao Dong; Xuexin Liu; Zhuojun Zou; Haoyue Yang; Lin Shu; Jie Hao

arXiv:2511.19509·cs.LG·November 26, 2025

TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception

Kailin Lyu, Long Xiao, Jianing Zeng, Junhao Dong, Xuexin Liu, Zhuojun Zou, Haoyue Yang, Lin Shu, Jie Hao

PDF

Open Access 1 Video

TL;DR

TouchFormer is a novel transformer-based framework that enhances multimodal material perception robustness by adaptively fusing cross-modal features and regularizing embeddings, significantly improving classification accuracy especially in real-world robotic applications.

Contribution

The paper introduces TouchFormer, a robust multimodal fusion framework with a Modality-Adaptive Gating mechanism and Cross-Instance Embedding Regularization, addressing challenges like noise and missing modalities.

Findings

01

Achieves 2.48% and 6.83% accuracy improvements on SSMC and USMC tasks.

02

Demonstrates effectiveness in real-world robotic perception scenarios.

03

Validates robustness against modality noise and missing data.

Abstract

Traditional vision-based material perception methods often experience substantial performance degradation under visually impaired conditions, thereby motivating the shift toward non-visual multimodal material perception. Despite this, existing approaches frequently perform naive fusion of multimodal inputs, overlooking key challenges such as modality-specific noise, missing modalities common in real-world scenarios, and the dynamically varying importance of each modality depending on the task. These limitations lead to suboptimal performance across several benchmark tasks. In this paper, we propose a robust multimodal fusion framework, TouchFormer. Specifically, we employ a Modality-Adaptive Gating (MAG) mechanism and intra- and inter-modality attention mechanisms to adaptively integrate cross-modal features, enhancing model robustness. Additionally, we introduce a Cross-Instance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TouchFormer: A Robust Transformer-based Framework for Multimodal Material Perception· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Robot Manipulation and Learning · Multimodal Machine Learning Applications