Don't Trust Generative Agents to Mimic Communication on Social Networks Unless You Benchmarked their Empirical Realism
Simon M\"unker, Nils Schwager, Achim Rettinger

TL;DR
This paper emphasizes the importance of empirically benchmarking generative AI agents' realism in social network simulations, highlighting that without such validation, their use in social science research is unreliable.
Contribution
It introduces a formal framework for simulating social networks and empirically tests LLMs' ability to imitate user communication in social media contexts.
Findings
Social simulations require empirical validation of realism.
LLMs show varying effectiveness in mimicking social media communication.
Rigorous benchmarking is essential for reliable social simulation using AI agents.
Abstract
The ability of Large Language Models (LLMs) to mimic human behavior triggered a plethora of computational social science research, assuming that empirical studies of humans can be conducted with AI agents instead. Since there have been conflicting research findings on whether and when this hypothesis holds, there is a need to better understand the differences in their experimental designs. We focus on replicating the behavior of social network users with the use of LLMs for the analysis of communication on social networks. First, we provide a formal framework for the simulation of social networks, before focusing on the sub-task of imitating user communication. We empirically test different approaches to imitate user behavior on X in English and German. Our findings suggest that social simulations should be validated by their empirical realism measured in the setting in which the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLanguage and cultural evolution · Opinion Dynamics and Social Influence · Computational and Text Analysis Methods
MethodsFocus
