Automated In-the-Wild Data Collection for Continual AI Generated Image Detection

Thanasis Pantsios; Dimitrios Karageorgiou; Christos Koutlis; George Karantaidis; Olga Papadopoulou; Symeon Papadopoulos

arXiv:2605.02567·cs.CV·May 5, 2026

Automated In-the-Wild Data Collection for Continual AI Generated Image Detection

Thanasis Pantsios, Dimitrios Karageorgiou, Christos Koutlis, George Karantaidis, Olga Papadopoulou, Symeon Papadopoulos

PDF

1 Datasets

TL;DR

This paper presents a data-centric continual learning framework that automatically collects in-the-wild data and incorporates generator-driven data to improve AI-generated image detection under evolving conditions.

Contribution

It introduces an automated, weakly supervised pipeline for dataset construction and demonstrates effective continual adaptation to new generative models.

Findings

01

Achieved +9.14% and +8% improvements in average accuracy on two detectors.

02

Showed that combining in-the-wild and generator-driven data enhances robustness.

03

Validated the approach through extensive experiments.

Abstract

The rapid advancement of generative Artificial Intelligence (AI) has introduced significant challenges for reliable AI-generated image detection. Existing detectors often suffer from performance degradation under distribution shifts and when encountering newly emerging generative models. In this work, we propose a data-centric continual adaptation framework for updating detectors in evolving environments. We show that both in-the-wild data and generator-driven data are essential for adapting detectors. We introduce an automated, weakly supervised pipeline for constructing in-the-wild datasets through fact-check article retrieval. Additionally, we demonstrate that incorporating even a small amount of generator-driven data during training enables effective adaptation to newly emerging models, while combining it with in-the-wild data within a continual learning framework enables robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

pthan12/AIGenImages2026
dataset· 25 dl
25 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.