TL;DR
UniAIDet is a comprehensive benchmark for detecting and localizing AI-generated images across diverse models and image types, addressing limitations of previous benchmarks and enabling robust evaluation of detection methods.
Contribution
We introduce UniAIDet, a unified benchmark covering diverse generative models and image categories, facilitating comprehensive evaluation of detection and localization methods.
Findings
UniAIDet covers a wide range of generative models and image types.
Evaluation reveals strengths and weaknesses of current detection methods.
The benchmark supports future research in AI-generated image detection.
Abstract
With the rapid proliferation of image generative models, the authenticity of digital images has become a significant concern. While existing studies have proposed various methods for detecting AI-generated content, current benchmarks are limited in their coverage of diverse generative models and image categories, often overlooking end-to-end image editing and artistic images. To address these limitations, we introduce UniAIDet, a unified and comprehensive benchmark that includes both photographic and artistic images. UniAIDet covers a wide range of generative models, including text-to-image, image-to-image, image inpainting, image editing, and deepfake models. Using UniAIDet, we conduct a comprehensive evaluation of various detection methods and answer three key research questions regarding generalization capability and the relation between detection and localization. Our benchmark and…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper is clearly written. The experiments are presented in an organized and systematic way, with diverse forms of visualization, including a large number of statistical figures. - The authors conduct a unified evaluation of multiple existing detectors, and the experimental results are comprehensive.
- The paper claims UniAIDet as “the first large-scale, wide-coverage benchmark … covering most potential practical scenarios,” emphasizing breadth across both holistic and partial synthesis, plus localization. However, [1] similarly targets localization, cross-domain generalization, and includes explicit explanatory annotations. A more direct comparison with [1] is needed. - The dataset construction process involves no human validation. Although an NSFW detector is applied to filter real images,
1. It mainly covers full synthesis and partial synthesis. It also includes photo, art, and localization tasks, making it very comprehensive. 2. The authors provide clear mathematical definitions for detection vs. localization: centering on the criterion of "whether pixels are produced by a generative model" rather than artifact-based methods. This makes the research direction more rigorous.
1. The paper focuses more on empirical comparisons and lacks mechanistic explanations. Why is partial synthesis particularly difficult to detect? The feature differences across different model types (frequency/spatial/semantic) are not visualized in depth. 2. It primarily considers generalization detection. What about robustness? Attacks like cropping and then re-generation / secondary editing (regeneration attacks).
1. The proposed dataset is diverse and comprehensive. It covers various generative methods (both holistic and partial synthesis), a broad range of generative models, and both realistic and artistic images. 2. The benchmarking of existing detection and localization methods reveals some interesting phenomena, for example, (1) some methods (e.g., NPR, AIDE) have consistent performance of detecting realistic and artistic images, even though they are trained solely on realistic ones; (2) models train
1. A major limitation of Sec. 4 is the **lack of in-depth analysis** of the evaluation results and observations. Limited insights are provided regarding the reasons behind the phenomena and possible future directions for improving the detection and localization methods. 2. The claim that the proposed UniAIDet is "the **first** large-scale, wide-coverage benchmark AI-generated image content detection and localization" (Lines 96-97) needs to be more specific about the coverage and uniqueness (e.g.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
