Dual Distillation for Few-Shot Anomaly Detection

Le Dong; Qinzhong Tan; Chunlei Li; Jingliang Hu; Yilei Shi; Weisheng Dong; Xiao Xiang Zhu; Lichao Mou

arXiv:2603.01713·cs.CV·March 3, 2026

Dual Distillation for Few-Shot Anomaly Detection

Le Dong, Qinzhong Tan, Chunlei Li, Jingliang Hu, Yilei Shi, Weisheng Dong, Xiao Xiang Zhu, Lichao Mou

PDF

Open Access 3 Reviews

TL;DR

This paper introduces D$^2$4FAD, a dual distillation framework for few-shot anomaly detection in medical imaging, achieving state-of-the-art results with limited normal reference images across diverse organs and modalities.

Contribution

The paper proposes a novel dual distillation approach with a learn-to-weight mechanism for improved few-shot anomaly detection in medical images.

Findings

01

Outperforms existing methods on a large multi-organ benchmark

02

Achieves significant improvements in anomaly detection accuracy

03

Demonstrates robustness across different organs and imaging modalities

Abstract

Anomaly detection is a critical task in computer vision with profound implications for medical imaging, where identifying pathologies early can directly impact patient outcomes. While recent unsupervised anomaly detection approaches show promise, they require substantial normal training data and struggle to generalize across anatomical contexts. We introduce D $^{2}$ 4FAD, a novel dual distillation framework for few-shot anomaly detection that identifies anomalies in previously unseen tasks using only a small number of normal reference images. Our approach leverages a pre-trained encoder as a teacher network to extract multi-scale features from both support and query images, while a student decoder learns to distill knowledge from the teacher on query images and self-distill on support images. We further propose a learn-to-weight mechanism that dynamically assesses the reference value of…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 5

Strengths

- Clear motivation and task definition for few-shot anomaly detection in medical settings. - Architecture is simple, fast, and avoids large generative models. - Strong image-level AUROC across multiple datasets and shot settings.

Weaknesses

- The work repeatedly emphasizes “dual distillation” as a key contribution, but the process does not fully match established definitions of distillation in the literature. Since the teacher network is frozen, and the student is not learning logits or semantic knowledge but merely reconstructing features, the term distillation may be overstated. This weakens the conceptual positioning of the contribution: the method is an anomaly-detection reconstruction framework rather than a genuine knowledge-

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper introduces a clear and well-motivated dual-distillation framework ($D^24FAD$) for few-shot anomaly detection, combining a teacher–student distillation mechanism with an additional student self-distillation path and a learn-to-weight module that adaptively re-weights support images conditioned on the query. While the core components (knowledge distillation, few-shot learning) are known, their integration into a unified few-shot medical anomaly detection framework is novel and concept

Weaknesses

The main limitation of the paper lies in the formulation of the task. Although the work is presented as addressing few-shot anomaly detection, the evaluation is restricted to image-level AUROC, effectively turning the problem into a binary classification task (normal versus abnormal). While the model internally produces anomaly maps, no quantitative localization results are provided (e.g., Dice, IoU, or AUPRO). This simplification reduces the methodological complexity of the problem and limits t

Reviewer 03Rating 6Confidence 3

Strengths

Clear Motivation and Problem Formulation: The paper does an excellent job of motivating the need for few-shot anomaly detection in clinical practice, grounding the research in a real-world problem. The formalization of the FAD task is clear and precise. Elegant and Effective Method: The D²FAD framework is simple yet powerful. The dual distillation concept is intuitive and well-justified. By using a frozen pre-trained encoder as the teacher, the method is parameter-efficient and avoids the need

Weaknesses

Limited Technical Depth in "Learn-to-Weight": While the "learn-to-weight" mechanism (Eq. 4) is a good idea, its presentation is somewhat brief. It is essentially a scaled dot-product attention between the query and support features. The paper could benefit from a deeper analysis or discussion of this component. For example, are there other ways to instantiate this weighting? How does this mechanism behave in practice (e.g., does it learn to ignore outlier-like support images)? Sensitivity to th

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning