LLM-Based Social Simulations Require a Boundary
Zengqing Wu, Run Peng, Takayuki Ito, Makoto Onizuka, Chuan Xiao

TL;DR
This paper emphasizes the importance of boundary-setting in LLM-based social simulations to ensure they accurately reflect behavioral diversity and contribute meaningfully to social science research.
Contribution
It highlights the limitations of current LLMs in capturing behavioral heterogeneity and proposes guidelines for validation practices and boundary-aware application in social simulations.
Findings
Most studies lack explicit variance assessment in behaviors.
LLMs tend to produce lower behavioral variance than humans.
Proper validation should match heterogeneity demands of research questions.
Abstract
This position paper argues that LLM-based social simulations require clear boundaries to make meaningful contributions to social science. While Large Language Models (LLMs) offer promising capabilities for simulating human behavior, their tendency to produce homogeneous outputs, acting as an "average persona", fundamentally limits their ability to capture the behavioral diversity essential for complex social dynamics. We examine why heterogeneity matters for social simulations and how current LLMs fall short, analyzing the relationship between mean alignment and variance in LLM-generated behaviors. Through a systematic review of representative studies, we find that validation practices often fail to match the heterogeneity requirements of research questions: while most papers include ground truth comparisons, fewer than half explicitly assess behavioral variance, and most that do report…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper's greatest strength is its clear, concise, and persuasive writing. It's an excellent summary of the key challenges in the field. 2. The "mean-variance" analysis of the alignment problem is a useful and intuitive way to decompose the "average persona" issue. 3. The paper's central message—that the field must be more critical, define its boundaries, and move beyond simple "replication"—is a crucial and timely corrective for the community. 4. The final "heuristic boundaries" (e.g.,
1. The primary weakness is the paper's failure to acknowledge and differentiate itself from What Limits LLM-based Human Simulation: LLMs or Our Design? arXiv:2501.08579. This prior work identified the exact same "LLM-inherent limitations" (lack of diversity/heterogeneity and inconsistency) as the core bottlenecks. This submission does not offer a new conceptual leap beyond what is already present in that paper. For a position paper, where the idea is the main contribution, this overlap is a cri
* The topic is timely. Large language model-based social simulation is a rapidly emerging area that indeed requires systematic methodological reflection. * The paper is clearly written and well-organized. * The authors try to connect social scientific perspectives with large language model agent research, which could be informative for newcomers to the field.
1. Lack of novel research contribution. The paper primarily summarizes and comments on existing works. It does not introduce a new theoretical framework, formal model, or empirical study. ICLR normally expects some form of novel insight, methodology, or evaluation rather than a descriptive review. 2. Insufficient depth and evaluation. The proposed “boundary checklist” remains conceptual and is not validated through adequate case studies or quantitative analysis. Without such evidence, it is d
1. The heuristic boundaries and checklist offer actionable guidance for defining simulation scopes. They focus on collective patterns and validation availability. This bridges AI capabilities with social science needs effectively. 2. The work synthesizes extensive literature to position LLM simulations responsibly. It advocates for avoiding overclaims and focusing on beneficial applications. This fosters interdisciplinary collaboration between AI and social science fields.
1. Empirical demonstrations are absent as the paper relies solely on critiques of existing studies. No original simulations are conducted to illustrate the proposed boundaries. This limits the ability to verify the framework's practical utility. 2. Potential differences across LLM models are not differentiated in the analysis. Assumptions about universal limitations may not hold for all models or future versions. This could lead to overly broad generalizations. 3. Quantitative metrics for meas
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Artificial Intelligence in Law · Business Process Modeling and Analysis
