A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection
Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong Liu

TL;DR
This paper introduces ADer, a comprehensive, extensible benchmark framework with multiple datasets, methods, and metrics for evaluating multi-class visual anomaly detection, significantly improving evaluation efficiency and consistency.
Contribution
It presents ADer, a modular benchmark platform with diverse datasets and methods, and the GPU-assisted ADEval package to accelerate evaluation, addressing current evaluation biases and inefficiencies.
Findings
Identified strengths and weaknesses of existing methods
Demonstrated the effectiveness of GPU acceleration in evaluation
Provided insights into challenges and future directions in the field
Abstract
Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
The focus on establishing a standardized evaluation protocol is both timely and relevant, addressing a critical gap in the AD field. The authors' provision of an integrated experimental setup offers a valuable foundation for future AD research. The paper also presents compelling experimental results, including analyses of stability, cross-domain dataset correlations, and more.
1. Alignment with ICLR Scope: While ADer provides a valuable standardized framework, the paper’s relevance to ICLR could be strengthened. Including a discussion of representation learning techniques in AD, or integrating such techniques into the benchmark, could better align the work with ICLR’s focus. Exploring representation learning might also broaden the utility of ADer by connecting it to core themes in machine learning and anomaly detection research. 2. Justification for Multi-class Focus
-The author addresses the issue of potentially unfair comparisons due to the lack of unified code evaluation for different methods in the Multi-class Anomaly Detection (AD) setting by proposing the extensible ADer library. This library implements 15 state-of-the-art anomaly detection methods on 11 popular datasets with 9 comprehensive evaluation metrics. - The developed ADEval GPU acceleration package significantly improves evaluation speed for large-scale datasets. - Comprehensive benchmark res
- Reproducibility of experimental errors: The benchmark results are based on single-run outcomes without calculating the standard deviation from repeated runs. What is the rationale behind this approach? Providing an analysis of randomness and reproducibility would enhance the credibility of the paper. - The meaning of evaluation metrics should be explained in detail. - Support for Zero-/Few-shot Anomaly Detection (AD) tasks: The authors primarily focus on multi-class AD. Can the proposed method
1. ADer, a comprehensive and fair benchmark, is proposed for the Visual Anomaly Detection (VAD) field to foster its sustainable and healthy development. 2. ADer offers a convenient and fair approach for evaluating anomaly detection methods. It is designed as a highly scalable, modular framework that seamlessly integrates with existing techniques. The framework includes datasets spanning three domains—industrial, medical, and general-purpose—and incorporates fifteen state-of-the-art anomaly detec
1. The novelty is not at the ICLR level. The primary aim of this paper is to integrate methods, datasets, and evaluation metrics in the field of visual anomaly detection to create a comprehensive modular framework. All the methods, datasets, and evaluation metrics included are based on existing technologies. 2. As the core content of this paper, the details of the ADer framework are not detailed enough. Only Figure 3 briefly describes the core sub-modules of the framework. The main purpose of AD
The paper proposes a library of tools for unified and efficient evaluation of anomaly detection methods. As the authors note, the resources available for evaluation are not efficiently implemented. This can lead to two problems, either a waste of time due to inefficient evaluation, or a non-reproducible and inconsistent evaluation due to differences in metric implementation. The provision of a unified and efficient tool in this article will address both of these problems if the tool is adopted b
The article compares many methods, on many datasets, with many metrics to give a complete assessment. However, the description of these methods, datasets, and metrics is very superficial. The authors should provide a brief overview of the key innovations for each method, summarize the distinguishing characteristics of the datasets, and explain the strengths and limitations of each evaluation metric. The methods are introduced in one paragraph and then quickly grouped into four broad categories.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Currency Recognition and Detection
