Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations?
Yue Huang, Zhengqing Yuan, Yujun Zhou, Kehan Guo, Xiangqi Wang, Haomin, Zhuang, Weixiang Sun, Lichao Sun, Jindong Wang, Yanfang Ye, Xiangliang Zhang

TL;DR
This paper evaluates the reliability of Large Language Models in social simulations, introduces a new dataset for assessment, and proposes a reinforcement learning method to improve their consistency and trustworthiness.
Contribution
It introduces TrustSim, a comprehensive dataset for evaluating LLM reliability in social science simulations, and proposes AdaORPO, a reinforcement learning algorithm to enhance simulation consistency.
Findings
Inconsistencies persist across LLM-based social role simulations.
LLM performance does not strongly correlate with simulation consistency.
AdaORPO improves reliability across multiple LLMs.
Abstract
Large Language Models (LLMs) are increasingly employed for simulations, enabling applications in role-playing agents and Computational Social Science (CSS). However, the reliability of these simulations is under-explored, which raises concerns about the trustworthiness of LLMs in these applications. In this paper, we aim to answer ``How reliable is LLM-based simulation?'' To address this, we introduce TrustSim, an evaluation dataset covering 10 CSS-related topics, to systematically investigate the reliability of the LLM simulation. We conducted experiments on 14 LLMs and found that inconsistencies persist in the LLM-based simulated roles. In addition, the consistency level of LLMs does not strongly correlate with their general performance. To enhance the reliability of LLMs in simulation, we proposed Adaptive Learning Rate Based ORPO (AdaORPO), a reinforcement learning-based algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
