GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment

Kidus Zewde; Simiao Ren; Xingyu Shen; Jiaqi Wu; Yuchen Zhou; Tommy Duong; Zikang Zhang; Ethan Traister; Kewen Xie

arXiv:2604.25370·cs.CV·May 7, 2026

GPT-Image-2 in the Wild: A Twitter Dataset of Self-Reported AI-Generated Images from the First Week of Deployment

Kidus Zewde, Simiao Ren, Xingyu Shen, Jiaqi Wu, Yuchen Zhou, Tommy Duong, Zikang Zhang, Ethan Traister, Kewen Xie

PDF

1 Repo 2 Datasets

TL;DR

This paper introduces the first dataset of GPT-image-2 generated images from Twitter posts, analyzes its content, and discusses challenges in verifying image provenance on social media.

Contribution

It presents a large, curated dataset of AI-generated images from Twitter, along with analysis methods and insights into content verification issues.

Findings

01

82.0% of images contain detectable text

02

59.2% of images contain faces with 22,583 total faces

03

Twitter's CDN strips content credentials, hindering provenance verification

Abstract

The release of GPT-image-2 by OpenAI marks a watershed moment in AI-generated imagery: the boundary between photographic reality and synthetic content has never been more difficult to discern. We introduce the GPT-Image-2 Twitter Dataset, the first published dataset of GPT-image-2 generated images, sourced from publicly available Twitter/X posts in the immediate aftermath of the model's April 21, 2026 release. Leveraging the Twitter API v2 and a multi-stage curation pipeline spanning multilingual text heuristics (English, Japanese, and Chinese), browser-automated Twitter "Made with AI" badge verification, and model name variant matching, we curate 10,217 confirmed GPT-image-2 images from 27,662 collected records over a six-day window. We characterize the dataset across four analyses: CLIP-based zero-shot subject taxonomy, OCR text legibility (82.0% of images contain detectable text),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.