MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval

Chaoran Xu; Chengkan Lv; Qiyu Chen; Feng Zhang; Zhengtao Zhang

arXiv:2602.00522·cs.CV·February 3, 2026

MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval

Chaoran Xu, Chengkan Lv, Qiyu Chen, Feng Zhang, Zhengtao Zhang

PDF

Open Access 3 Reviews

TL;DR

MRAD introduces a memory retrieval-based framework for zero-shot anomaly detection that outperforms existing methods by leveraging explicit data distribution without complex modeling.

Contribution

The paper proposes MRAD, a novel train-free memory retrieval framework for zero-shot anomaly detection, replacing parametric fitting with explicit memory banks for improved cross-domain stability.

Findings

01

Superior performance on 16 datasets in anomaly detection tasks.

02

Effective zero-shot detection without training, using memory retrieval.

03

Enhanced generalization via lightweight variants MRAD-FT and MRAD-CLIP.

Abstract

Zero-shot anomaly detection (ZSAD) often leverages pretrained vision or vision-language models, but many existing methods use prompt learning or complex modeling to fit the data distribution, resulting in high training or inference cost and limited cross-domain stability. To address these limitations, we propose Memory-Retrieval Anomaly Detection method (MRAD), a unified framework that replaces parametric fitting with a direct memory retrieval. The train-free base model, MRAD-TF, freezes the CLIP image encoder and constructs a two-level memory bank (image-level and pixel-level) from auxiliary data, where feature-label pairs are explicitly stored as keys and values. During inference, anomaly scores are obtained directly by similarity retrieval over the memory bank. Based on the MRAD-TF, we further propose two lightweight variants as enhancements: (i) MRAD-FT fine-tunes the retrieval…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. MRAD reframes zero-shot anomaly detection as a non-parametric retrieval problem rather than a traditional model-fitting task. The proposed two-level (image- and pixel-level) memory bank is conceptually simple yet effective, marking a meaningful departure from existing CLIP-based prompt-learning approaches. 2. The experimental evaluation is extensive, covering 16 datasets across both industrial and medical domains. The results demonstrate the robustness and generalization ability of the propo

Weaknesses

1. While the empirical results are strong, the paper lacks theoretical justification or analytical insight into why a retrieval-based framework can outperform traditional parametric fitting approaches. The memory mechanism has been explored extensively in few-shot anomaly detection, and the novelty here lies in extending it to the zero-shot setting. Therefore, the authors should provide a more detailed discussion on why and how features extracted from the source domain can generalize effectively

Reviewer 02Rating 6Confidence 3

Strengths

1. The framework is simple and effective. 2. The paper is clearly written and easy to follow. 3. Extensive experiments on both industrial and medical benchmarks support the claims.

Weaknesses

1. The fine-tuning stage adopts two linear projection layers, but no ablation compares against shallower (e.g., 1-layer) or deeper variants, making it unclear whether the chosen depth is optimal or arbitrary. 2. The method emphasizes the benefit of two-level memory (image + pixel), but there is no ablation where one level is removed to show whether both levels are truly necessary. 3. No sensitivity analysis is provided for key hyperparameters (e.g., similarity mask ratio ρ, top-k selection, th

Reviewer 03Rating 6Confidence 4

Strengths

This paper proposes a novel approach that replaces parametric fitting with a direct memory retrieval to ZSAD, offering a fresh perspective on anomaly detection. It demonstrates soundness in both theoretical grounding and empirical validation. Also, the paper gives clear definitions and explanations of methodologies, making it accessible to readers.

Weaknesses

Major: 1. All experiments use VisA or MVTec-AD as the auxiliary dataset. Could other datasets be used as the auxiliary dataset? 2. MRAD-CLIP injects region priors as additive biases into CLIP’s learnable prompts. It remains unclear whether this choice of design is optimal or merely sufficient. 3. MRAD-FT adds 2.76M parameters, but the fine-tuning efficiency remains under-explored. This may result in the inadequately quantified “lightweight” claim. Minor: 1. Memory bank size scales with the auxi

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning