VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing

Junyi Zong; Qingxuan Jia; Meixian Shi; Tong Li; Jiayuan Li; Zihang Lv; Gang Chen; and Fang Deng

arXiv:2604.03322·cs.CV·April 7, 2026

VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing

Junyi Zong, Qingxuan Jia, Meixian Shi, Tong Li, Jiayuan Li, Zihang Lv, Gang Chen, and Fang Deng

PDF

1 Repo

TL;DR

VitaTouch is a multimodal model combining vision, tactile, and language data to improve material property inference and defect detection in robotic manufacturing, outperforming existing benchmarks.

Contribution

The paper introduces VitaTouch, a novel property-aware vision-tactile-language model with a new dataset, achieving state-of-the-art results in material property inference and defect recognition.

Findings

01

VitaTouch achieves 88.89% hardness accuracy on VitaSet.

02

It reaches 75.13% roughness accuracy and 54.81% descriptor recall.

03

It attains 100% defect recognition accuracy with fine-tuning.

Abstract

Quality inspection in smart manufacturing requires identifying intrinsic material and surface properties beyond visible geometry, yet vision-only methods remain vulnerable to occlusion and reflection. We propose VitaTouch, a property-aware vision-tactile-language model for material-property inference and natural-language attribute description. VitaTouch uses modality-specific encoders and a dual Q-Former to extract language-relevant visual and tactile features, which are compressed into prefix tokens for a large language model. We align each modality with text and explicitly couple vision and touch through contrastive learning. We also construct VitaSet, a multimodal dataset with 186 objects, 52k images, and 5.1k human-verified instruction-answer pairs. VitaTouch achieves the best performance on HCT and the overall TVL benchmark, while remaining competitive on SSVTP. On VitaSet, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://vitatouch.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.