ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection

Ke Ma; Jun Long; Hongxiao Fei; Liujie Hua; Zhen Dai; Yueyi Luo

arXiv:2508.07819·cs.CV·March 30, 2026

ACD-CLIP: Decoupling Representation and Dynamic Fusion for Zero-Shot Anomaly Detection

Ke Ma, Jun Long, Hongxiao Fei, Liujie Hua, Zhen Dai, Yueyi Luo

PDF

1 Repo

TL;DR

This paper introduces ACD-CLIP, a novel framework that enhances zero-shot anomaly detection by jointly refining feature representations and dynamic cross-modal fusion, significantly improving performance on industrial and medical benchmarks.

Contribution

It proposes a co-designed architecture with Conv-LoRA for local bias injection and a Dynamic Fusion Gateway for adaptive multimodal fusion, addressing key limitations of pre-trained vision-language models.

Findings

01

Achieves superior accuracy on diverse benchmarks.

02

Demonstrates robustness in industrial and medical anomaly detection.

03

Validates the importance of joint feature refinement and dynamic fusion.

Abstract

Pre-trained Vision-Language Models (VLMs) struggle with Zero-Shot Anomaly Detection (ZSAD) due to a critical adaptation gap: they lack the local inductive biases required for dense prediction and employ inflexible feature fusion paradigms. We address these limitations through an Architectural Co-Design framework that jointly refines feature representation and cross-modal fusion. Our method proposes a parameter-efficient Convolutional Low-Rank Adaptation (Conv-LoRA) adapter to inject local inductive biases for fine-grained representation, and introduces a Dynamic Fusion Gateway (DFG) that leverages visual context to adaptively modulate text prompts, enabling a powerful bidirectional fusion. Extensive experiments on diverse industrial and medical benchmarks demonstrate superior accuracy and robustness, validating that this synergistic co-design is critical for robustly adapting foundation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cockmake/ACD-CLIP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.