When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text

Hillary Dawkins; Kathleen C. Fraser; Svetlana Kiritchenko

arXiv:2506.09975·cs.CL·June 17, 2025

When Detection Fails: The Power of Fine-Tuned Models to Generate Human-Like Social Media Text

Hillary Dawkins, Kathleen C. Fraser, Svetlana Kiritchenko

PDF

Open Access

TL;DR

This paper investigates the challenge of detecting AI-generated social media posts, revealing that fine-tuned models can produce human-like content that evades detection, especially when the attacker keeps their models private.

Contribution

The study creates a large dataset of AI-generated social media posts and demonstrates the significant decline in detectability when models are fine-tuned and kept private.

Findings

01

Detection accuracy drops when models are fine-tuned and private.

02

Humans struggle to distinguish AI-generated posts from real ones.

03

Detection algorithms are vulnerable to fine-tuned LLMs.

Abstract

Detecting AI-generated text is a difficult problem to begin with; detecting AI-generated text on social media is made even more difficult due to the short text length and informal, idiosyncratic language of the internet. It is nonetheless important to tackle this problem, as social media represents a significant attack vector in online influence campaigns, which may be bolstered through the use of mass-produced AI-generated posts supporting (or opposing) particular policies, decisions, or events. We approach this problem with the mindset and resources of a reasonably sophisticated threat actor, and create a dataset of 505,159 AI-generated social media posts from a combination of open-source, closed-source, and fine-tuned LLMs, covering 11 different controversial topics. We show that while the posts can be detected under typical research assumptions about knowledge of and access to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Spam and Phishing Detection