PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

Han Bao; Penghao Zhang; Yue Huang; Zhengqing Yuan; Yanchi Ru; Rui Su; Yujun Zhou; Xiangqi Wang; Kehan Guo; Nitesh V Chawla; Yanfang Ye; Xiangliang Zhang

arXiv:2604.12995·cs.CL·April 15, 2026

PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models

Han Bao, Penghao Zhang, Yue Huang, Zhengqing Yuan, Yanchi Ru, Rui Su, Yujun Zhou, Xiangqi Wang, Kehan Guo, Nitesh V Chawla, Yanfang Ye, Xiangliang Zhang

PDF

TL;DR

This paper introduces PolicyBench, a large-scale benchmark for evaluating LLMs' policy comprehension across memorization, understanding, and application, and proposes PolicyMoE, a specialized model that improves policy reasoning.

Contribution

It presents the first comprehensive policy comprehension benchmark and a domain-specific Mixture-of-Experts model to enhance LLMs' policy reasoning capabilities.

Findings

01

Models perform best on application-oriented tasks.

02

PolicyMoE outperforms baseline models on structured reasoning.

03

Current LLMs have notable limitations in policy understanding.

Abstract

Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-system benchmark (US-China) evaluating policy comprehension, comprising 21K cases across a broad spectrum of policy areas, capturing the diversity and complexity of real-world governance. Following Bloom's taxonomy, the benchmark assesses three core capabilities: (1) \textbf{Memorization}: factual recall of policy knowledge, (2) \textbf{Understanding}: conceptual and contextual reasoning, and (3) \textbf{Application}: problem-solving in real-life policy scenarios. Building on this benchmark, we further propose \textbf{\textit{PolicyMoE}}, a domain-specialized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.