A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection

Jiangning Zhang; Haoyang He; Zhenye Gan; Qingdong He; Yuxuan Cai; Zhucun Xue; Yabiao Wang; Chengjie Wang; Lei Xie; Yong Liu

arXiv:2406.03262·cs.CV·August 26, 2025·1 cites

A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection

Jiangning Zhang, Haoyang He, Zhenye Gan, Qingdong He, Yuxuan Cai, Zhucun Xue, Yabiao Wang, Chengjie Wang, Lei Xie, Yong Liu

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper introduces ADer, a comprehensive, extensible benchmark framework with multiple datasets, methods, and metrics for evaluating multi-class visual anomaly detection, significantly improving evaluation efficiency and consistency.

Contribution

It presents ADer, a modular benchmark platform with diverse datasets and methods, and the GPU-assisted ADEval package to accelerate evaluation, addressing current evaluation biases and inefficiencies.

Findings

01

Identified strengths and weaknesses of existing methods

02

Demonstrated the effectiveness of GPU acceleration in evaluation

03

Provided insights into challenges and future directions in the field

Abstract

Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 5Confidence 4

Strengths

The focus on establishing a standardized evaluation protocol is both timely and relevant, addressing a critical gap in the AD field. The authors' provision of an integrated experimental setup offers a valuable foundation for future AD research. The paper also presents compelling experimental results, including analyses of stability, cross-domain dataset correlations, and more.

Weaknesses

1. Alignment with ICLR Scope: While ADer provides a valuable standardized framework, the paper’s relevance to ICLR could be strengthened. Including a discussion of representation learning techniques in AD, or integrating such techniques into the benchmark, could better align the work with ICLR’s focus. Exploring representation learning might also broaden the utility of ADer by connecting it to core themes in machine learning and anomaly detection research. 2. Justification for Multi-class Focus

Reviewer 02Rating 8Confidence 4

Strengths

-The author addresses the issue of potentially unfair comparisons due to the lack of unified code evaluation for different methods in the Multi-class Anomaly Detection (AD) setting by proposing the extensible ADer library. This library implements 15 state-of-the-art anomaly detection methods on 11 popular datasets with 9 comprehensive evaluation metrics. - The developed ADEval GPU acceleration package significantly improves evaluation speed for large-scale datasets. - Comprehensive benchmark res

Weaknesses

- Reproducibility of experimental errors: The benchmark results are based on single-run outcomes without calculating the standard deviation from repeated runs. What is the rationale behind this approach? Providing an analysis of randomness and reproducibility would enhance the credibility of the paper. - The meaning of evaluation metrics should be explained in detail. - Support for Zero-/Few-shot Anomaly Detection (AD) tasks: The authors primarily focus on multi-class AD. Can the proposed method

Reviewer 03Rating 3Confidence 4

Strengths

1. ADer, a comprehensive and fair benchmark, is proposed for the Visual Anomaly Detection (VAD) field to foster its sustainable and healthy development. 2. ADer offers a convenient and fair approach for evaluating anomaly detection methods. It is designed as a highly scalable, modular framework that seamlessly integrates with existing techniques. The framework includes datasets spanning three domains—industrial, medical, and general-purpose—and incorporates fifteen state-of-the-art anomaly detec

Weaknesses

1. The novelty is not at the ICLR level. The primary aim of this paper is to integrate methods, datasets, and evaluation metrics in the field of visual anomaly detection to create a comprehensive modular framework. All the methods, datasets, and evaluation metrics included are based on existing technologies. 2. As the core content of this paper, the details of the ADer framework are not detailed enough. Only Figure 3 briefly describes the core sub-modules of the framework. The main purpose of AD

Reviewer 04Rating 3Confidence 4

Strengths

The paper proposes a library of tools for unified and efficient evaluation of anomaly detection methods. As the authors note, the resources available for evaluation are not efficiently implemented. This can lead to two problems, either a waste of time due to inefficient evaluation, or a non-reproducible and inconsistent evaluation due to differences in metric implementation. The provision of a unified and efficient tool in this article will address both of these problems if the tool is adopted b

Weaknesses

The article compares many methods, on many datasets, with many metrics to give a complete assessment. However, the description of these methods, datasets, and metrics is very superficial. The authors should provide a brief overview of the key innovations for each method, summarize the distinguishing characteristics of the datasets, and explain the strengths and limitations of each evaluation metric. The methods are introduced in one paragraph and then quickly grouped into four broad categories.

Code & Models

Repositories

zhangzjn/ader
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · COVID-19 diagnosis using AI · Currency Recognition and Detection