VTFusion: A Vision-Text Multimodal Fusion Network for Few-Shot Anomaly Detection

Yuxin Jiang; Yunkang Cao; Yuqi Cheng; Yiheng Zhang; Weiming Shen

arXiv:2601.16381·cs.CV·January 26, 2026

VTFusion: A Vision-Text Multimodal Fusion Network for Few-Shot Anomaly Detection

Yuxin Jiang, Yunkang Cao, Yuqi Cheng, Yiheng Zhang, Weiming Shen

PDF

Open Access

TL;DR

VTFusion introduces a multimodal fusion network that leverages domain-specific vision-text features and a dedicated fusion module to improve few-shot anomaly detection in industrial settings.

Contribution

The paper presents a novel vision-text fusion framework with adaptive feature extractors and a specialized fusion module tailored for industrial anomaly detection.

Findings

01

Achieves 96.8% AUROC in 2-shot MVTec AD dataset

02

Attains 86.2% AUROC on VisA dataset

03

Reaches 93.5% AUPRO on industrial automotive parts dataset

Abstract

Few-Shot Anomaly Detection (FSAD) has emerged as a critical paradigm for identifying irregularities using scarce normal references. While recent methods have integrated textual semantics to complement visual data, they predominantly rely on features pre-trained on natural scenes, thereby neglecting the granular, domain-specific semantics essential for industrial inspection. Furthermore, prevalent fusion strategies often resort to superficial concatenation, failing to address the inherent semantic misalignment between visual and textual modalities, which compromises robustness against cross-modal interference. To bridge these gaps, this study proposes VTFusion, a vision-text multimodal fusion framework tailored for FSAD. The framework rests on two core designs. First, adaptive feature extractors for both image and text modalities are introduced to learn task-specific representations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications