PolicySimEval: A Benchmark for Evaluating Policy Outcomes through Agent-Based Simulation
Jiaju Kang, Puyu Han, Tian Zhang, Luqi Gong

TL;DR
PolicySimEval is a comprehensive benchmark designed to evaluate the effectiveness of agent-based simulation models in complex policy assessment tasks, highlighting current limitations and guiding future improvements.
Contribution
This paper introduces PolicySimEval, the first benchmark for systematically assessing agent-based models in policy evaluation, covering diverse scenarios and sub-tasks.
Findings
State-of-the-art systems achieve low coverage on benchmark tasks.
Current models struggle with complex policy simulation scenarios.
Benchmark reveals significant gaps in existing simulation capabilities.
Abstract
With the growing adoption of agent-based models in policy evaluation, a pressing question arises: Can such systems effectively simulate and analyze complex social scenarios to inform policy decisions? Addressing this challenge could significantly enhance the policy-making process, offering researchers and practitioners a systematic way to validate, explore, and refine policy outcomes. To advance this goal, we introduce PolicySimEval, the first benchmark designed to evaluate the capability of agent-based simulations in policy assessment tasks. PolicySimEval aims to reflect the real-world complexities faced by social scientists and policymakers. The benchmark is composed of three categories of evaluation tasks: (1) 20 comprehensive scenarios that replicate end-to-end policy modeling challenges, complete with annotated expert solutions; (2) 65 targeted sub-tasks that address specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsdemographic modeling and climate adaptation
