PolicySimEval: A Benchmark for Evaluating Policy Outcomes through   Agent-Based Simulation

Jiaju Kang; Puyu Han; Tian Zhang; Luqi Gong

arXiv:2502.07853·cs.MA·February 13, 2025

PolicySimEval: A Benchmark for Evaluating Policy Outcomes through Agent-Based Simulation

Jiaju Kang, Puyu Han, Tian Zhang, Luqi Gong

PDF

Open Access

TL;DR

PolicySimEval is a comprehensive benchmark designed to evaluate the effectiveness of agent-based simulation models in complex policy assessment tasks, highlighting current limitations and guiding future improvements.

Contribution

This paper introduces PolicySimEval, the first benchmark for systematically assessing agent-based models in policy evaluation, covering diverse scenarios and sub-tasks.

Findings

01

State-of-the-art systems achieve low coverage on benchmark tasks.

02

Current models struggle with complex policy simulation scenarios.

03

Benchmark reveals significant gaps in existing simulation capabilities.

Abstract

With the growing adoption of agent-based models in policy evaluation, a pressing question arises: Can such systems effectively simulate and analyze complex social scenarios to inform policy decisions? Addressing this challenge could significantly enhance the policy-making process, offering researchers and practitioners a systematic way to validate, explore, and refine policy outcomes. To advance this goal, we introduce PolicySimEval, the first benchmark designed to evaluate the capability of agent-based simulations in policy assessment tasks. PolicySimEval aims to reflect the real-world complexities faced by social scientists and policymakers. The benchmark is composed of three categories of evaluation tasks: (1) 20 comprehensive scenarios that replicate end-to-end policy modeling challenges, complete with annotated expert solutions; (2) 65 targeted sub-tasks that address specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsdemographic modeling and climate adaptation