Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits
Jinyi Ye, Lei Cao, Ding Chen, Emilio Ferrara

TL;DR
This paper emphasizes the importance of robustness audits in LLM social simulations, demonstrating how minor perturbations can significantly alter outcomes and proposing a taxonomy for systematic robustness validation.
Contribution
It introduces TRAILS, a comprehensive taxonomy for robustness audits in LLM social simulations, and advocates for robustness as a core validation step.
Findings
Minor perturbations can cause large shifts in simulation outcomes.
Robustness varies significantly across models and architectural choices.
Systematic robustness audits are essential for credible social simulation claims.
Abstract
The scientific claims drawn from LLM social simulations should be no stronger than the robustness audits that support them. Generative agents bring new expressive power to agent-based modeling, enabling simulations of collective social processes like cooperation, polarization, and norm formation. Yet they also introduce complexity through additional architectural choices, such as agent specification, memory representation, interaction protocols, and environment design. Small perturbations that appear minor to researchers can cascade into macro-level outcomes through repeated interaction, creating a "butterfly effect." Consequently, scientific claims drawn from LLM social simulations may reflect implementation artifacts rather than the social mechanisms being modeled. We support this position with two case studies: a repeated Prisoner's Dilemma and a social media echo chamber…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
