The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

Azza Bouleimen; Giordano De Marzo; Taehee Kim; Nicol`o Pagan; Hannah Metzler; Silvia Giordano; Anik\'o Hann\'ak; David Garcia

arXiv:2511.08592·cs.CL·April 2, 2026

The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

Azza Bouleimen, Giordano De Marzo, Taehee Kim, Nicol`o Pagan, Hannah Metzler, Silvia Giordano, Anik\'o Hann\'ak, David Garcia

PDF

TL;DR

This study shows that large language models can generate social media conversations that are convincing enough to deceive humans, with implications for social simulation and potential misuse.

Contribution

It provides an empirical evaluation of LLMs' ability to mimic human group conversations on social media, comparing Llama 3 and GPT-4o to real Reddit discussions.

Findings

01

LLMs' generated conversations were mistaken for human content 39% of the time.

02

Participants identified Llama 3 conversations as AI-generated only 56% of the time.

03

LLMs can produce social media content that convincingly mimics human discussions.

Abstract

Large Language Models (LLMs) offer new avenues to simulate online communities and social media. Potential applications range from testing the design of content recommendation algorithms to estimating the effects of content policies and interventions. However, the validity of using LLMs to simulate conversations between various users remains largely untested. We evaluated whether LLMs can convincingly mimic human group conversations on social media. We collected authentic human conversations from Reddit and generated artificial conversations on the same topic with two LLMs: Llama 3 70B and GPT-4o. When presented side-by-side to study participants, LLM-generated conversations were mistaken for human-created content 39\% of the time. In particular, when evaluating conversations generated by Llama 3, participants correctly identified them as AI-generated only 56\% of the time, barely better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.