Continual-MEGA: A Large-scale Benchmark for Generalizable Continual Anomaly Detection

Geonu Lee; Yujeong Oh; Geonhui Jang; Soyoung Lee; Jeonghyo Song; Sungmin Cha; YoungJoon Yoo

arXiv:2506.00956·cs.CV·February 9, 2026

Continual-MEGA: A Large-scale Benchmark for Generalizable Continual Anomaly Detection

Geonu Lee, Yujeong Oh, Geonhui Jang, Soyoung Lee, Jeonghyo Song, Sungmin Cha, YoungJoon Yoo

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Continual-MEGA, a comprehensive benchmark for continual anomaly detection that includes a large dataset and a new zero-shot generalization scenario, aiming to improve robustness and real-world applicability.

Contribution

The paper presents a new large-scale benchmark, Continual-MEGA, with a novel zero-shot generalization scenario and a unified baseline algorithm for continual anomaly detection.

Findings

01

Existing methods need improvement, especially in pixel-level defect localization.

02

The proposed method outperforms prior approaches.

03

The ContinualAD dataset improves anomaly detection performance.

Abstract

In this paper, we introduce a new benchmark for continual learning in anomaly detection, aimed at better reflecting real-world deployment scenarios. Our benchmark, Continual-MEGA, includes a large and diverse dataset that significantly expands existing evaluation settings by combining carefully curated existing datasets with our newly proposed dataset, ContinualAD. In addition to standard continual learning with expanded quantity, we propose a novel scenario that measures zero-shot generalization to unseen classes, those not observed during continual adaptation. This setting poses a new problem setting that continual adaptation also enhances zero-shot performance. We also present a unified baseline algorithm that improves robustness in few-shot detection and maintains strong generalization. Through extensive evaluations, we report three key findings: (1) existing methods show…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

- A new large-scale benchmark that unifies multiple AD datasets and defines reproducible task streams. - Dataset and benchmark release (if completed) could be a useful resource for the community.

Weaknesses

- The paper claims that existing methods fail in continual AD, but Table 3 shows that MVFA (CVPR 2024 Spotlight), a non-continual zero-shot VLM-based method, performs competitively with the proposed ADCT. This contradicts the central claim that new continual-learning methods are required. If a zero-shot method performs as well as the proposed continual method, the necessity of the benchmark and ADCT is not established. - Evaluation in Scenario 2/3 artificially disadvantages MVFA, leading to inva

Reviewer 02Rating 4Confidence 4

Strengths

1. Large, diverse benchmark with realistic continual and CZSL settings that better reflect deployment. 2. Broad, carefully reported comparisons across method families with appropriate metrics (image AUROC, pixel AP, forgetting). 3. Clear empirical takeaways on generalization vs. forgetting, highlighting where current methods break. 4. A strong, reproducible CLIP-based baseline that others can extend; code/benchmark availability increases impact.

Weaknesses

1. Training-budget mismatch likely benefits the proposed baseline; needs a strictly matched compute comparison. 2. No ablations to disentangle the effects of adapters, mixing strategy, and synthetic feature generation. 3. Limited documentation of the new dataset in the main text (how anomalies are obtained, per-class stats, representative examples). 4. Task split construction and order sensitivity are under-specified, making reproducibility and robustness hard to assess.

Reviewer 03Rating 6Confidence 3

Strengths

The paper proposes a new problem within the domain of Anomaly Detection, which is relevant to practical applications of AD in industry. The proposed method produces strong performance.

Weaknesses

The paper is quite challenging to read, and could be better structured to present information in a more logical and more easily comprehensible way. Table 1 seems redundant given that the same information is also present in table 2. The tables and figures use font sizes that are not clearly legible. The caption for Fig.4 fails to fully describe what is shown, and the 7 colored backgrounds in this figure do not seem to correspond to the 4 sub-sets of tasks described in the corresponding main te

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications