Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media

Zhen Sun; Zongmin Zhang; Xinyue Shen; Ziyi Zhang; Yule Liu; Michael Backes; Yang Zhang; Xinlei He

arXiv:2412.18148·cs.AI·June 3, 2025·2 cites

Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media

Zhen Sun, Zongmin Zhang, Xinyue Shen, Ziyi Zhang, Yule Liu, Michael Backes, Yang Zhang, Xinlei He

PDF

Open Access 1 Models 1 Datasets 1 Video

TL;DR

This paper quantifies and monitors AI-Generated Texts on social media, revealing their increasing prevalence and distinct characteristics across platforms, and introduces a benchmark and detector for future research.

Contribution

It constructs a large dataset and benchmark for detecting AIGTs, develops a state-of-the-art detector, and provides comprehensive analysis of AIGT trends and features on social media.

Findings

01

AIGT prevalence is rising rapidly on Medium and Quora.

02

Reddit shows slower growth in AIGT presence.

03

AIGTs differ from human texts in linguistic and engagement patterns.

Abstract

Social media platforms are experiencing a growing presence of AI-Generated Texts (AIGTs). However, the misuse of AIGTs could have profound implications for public opinion, such as spreading misinformation and manipulating narratives. Despite its importance, it remains unclear how prevalent AIGTs are on social media. To address this gap, this paper aims to quantify and monitor the AIGTs on online social media platforms. We first collect a dataset (SM-D) with around 2.4M posts from 3 major social media platforms: Medium, Quora, and Reddit. Then, we construct a diverse dataset (AIGTBench) to train and evaluate AIGT detectors. AIGTBench combines popular open-source datasets and our AIGT datasets generated from social media texts by 12 LLMs, serving as a benchmark for evaluating mainstream detectors. With this setup, we identify the best-performing detector (OSM-Det). We then apply OSM-Det…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
tarryzhang/OSM-Det
model· 26 dl
26 dl

Datasets

tarryzhang/AIGTBench
dataset· 105 dl
105 dl

Videos

Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media· underline

Taxonomy

TopicsTopic Modeling