Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors
Anisha Pal, Julia Kruk, Mansi Phute, Manognya Bhattaram, Diyi Yang,, Duen Horng Chau, Judy Hoffman

TL;DR
SEMI-TRUTHS is a large-scale dataset of real and AI-augmented images designed to evaluate the robustness of AI-generated image detectors across various perturbations and data distributions.
Contribution
We introduce SEMI-TRUTHS, a comprehensive dataset with diverse augmentations and metadata for standardized evaluation of detector robustness.
Findings
Detectors show varying sensitivities to different perturbations.
Performance depends on augmentation types and data distributions.
Insights reveal limitations and biases in current detection methods.
Abstract
Text-to-image diffusion models have impactful applications in art, design, and entertainment, yet these technologies also pose significant risks by enabling the creation and dissemination of misinformation. Although recent advancements have produced AI-generated image detectors that claim robustness against various augmentations, their true effectiveness remains uncertain. Do these detectors reliably identify images with different levels of augmentation? Are they biased toward specific scenes or data distributions? To investigate, we introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images that feature targeted and localized perturbations produced using diverse augmentation techniques, diffusion models, and data distributions. Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI
MethodsDiffusion
