Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

Jinyi Ye; Lei Cao; Ding Chen; Emilio Ferrara

arXiv:2605.18890·physics.soc-ph·May 20, 2026

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

Jinyi Ye, Lei Cao, Ding Chen, Emilio Ferrara

PDF

TL;DR

This paper emphasizes the importance of robustness audits in LLM social simulations, demonstrating how minor perturbations can significantly alter outcomes and proposing a taxonomy for systematic robustness validation.

Contribution

It introduces TRAILS, a comprehensive taxonomy for robustness audits in LLM social simulations, and advocates for robustness as a core validation step.

Findings

01

Minor perturbations can cause large shifts in simulation outcomes.

02

Robustness varies significantly across models and architectural choices.

03

Systematic robustness audits are essential for credible social simulation claims.

Abstract

The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such as agent specification, memory representation, interaction protocols, and environment design. Small perturbations that appear minor to researchers can cascade into macro-level outcomes through repeated interaction, creating a "butterfly effect." Consequently, scientific claims drawn from LLM social simulations may reflect implementation artifacts rather than the social mechanisms being modeled. We support this position with two case studies: a repeated Prisoner's Dilemma and a social media echo chamber…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.