Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating   Robustness of AI-Generated Image detectors

Anisha Pal; Julia Kruk; Mansi Phute; Manognya Bhattaram; Diyi Yang,; Duen Horng Chau; Judy Hoffman

arXiv:2411.07472·cs.CV·November 13, 2024

Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors

Anisha Pal, Julia Kruk, Mansi Phute, Manognya Bhattaram, Diyi Yang,, Duen Horng Chau, Judy Hoffman

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

SEMI-TRUTHS is a large-scale dataset of real and AI-augmented images designed to evaluate the robustness of AI-generated image detectors across various perturbations and data distributions.

Contribution

We introduce SEMI-TRUTHS, a comprehensive dataset with diverse augmentations and metadata for standardized evaluation of detector robustness.

Findings

01

Detectors show varying sensitivities to different perturbations.

02

Performance depends on augmentation types and data distributions.

03

Insights reveal limitations and biases in current detection methods.

Abstract

Text-to-image diffusion models have impactful applications in art, design, and entertainment, yet these technologies also pose significant risks by enabling the creation and dissemination of misinformation. Although recent advancements have produced AI-generated image detectors that claim robustness against various augmentations, their true effectiveness remains uncertain. Do these detectors reliably identify images with different levels of augmentation? Are they biased toward specific scenes or data distributions? To investigate, we introduce SEMI-TRUTHS, featuring 27,600 real images, 223,400 masks, and 1,472,700 AI-augmented images that feature targeted and localized perturbations produced using diverse augmentation techniques, diffusion models, and data distributions. Each augmented image is accompanied by metadata for standardized and targeted evaluation of detector robustness. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

j-kruk/semitruths
pytorchOfficial

Datasets

semi-truths/Semi-Truths
dataset· 647 dl
647 dl

Videos

Semi-Truths: A Large-Scale Dataset of AI-Augmented Images for Evaluating Robustness of AI-Generated Image detectors· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI

MethodsDiffusion