Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach
Xiaoyou Qin, Zhihong Li, Xiaoxiao Cheng

TL;DR
This paper introduces audience segmentation to improve heterogeneity in LLM-based social simulations, demonstrating its impact on distributional, structural, and predictive fidelity using U.S. climate opinion data.
Contribution
It systematically compares segmentation configurations across open-weight LLMs, revealing how granularity, parsimony, and selection logic affect simulation fidelity.
Findings
Moderate segmentation granularity can enhance performance.
Compact configurations often match or outperform larger ones.
Selection logic influences which fidelity dimension is preserved.
Abstract
Large Language Models (LLMs) are increasingly used to simulate social attitudes and behaviors, offering scalable "silicon samples" that can approximate human data. However, current simulation practice often collapses diversity into an "average persona," masking subgroup variation that is central to social reality. This study introduces audience segmentation as a systematic approach for restoring heterogeneity in LLM-based social simulation. Using U.S. climate-opinion survey data, we compare six segmentation configurations across two open-weight LLMs (Llama 3.1-70B and Mixtral 8x22B), varying segmentation identifier granularity, parsimony, and selection logic (theory-driven, data-driven, and instrument-based). We evaluate simulation performance with a three-dimensional evaluation framework covering distributional, structural, and predictive fidelity. Results show that increasing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
