Benchmarking LLMs for Political Science: A United Nations Perspective

Yueqing Liang; Liangwei Yang; Chen Wang; Congying Xia; Rui Meng; Xiongxiao Xu; Haoran Wang; Ali Payani; Kai Shu

arXiv:2502.14122·cs.CL·January 26, 2026

Benchmarking LLMs for Political Science: A United Nations Perspective

Yueqing Liang, Liangwei Yang, Chen Wang, Congying Xia, Rui Meng, Xiongxiao Xu, Haoran Wang, Ali Payani, Kai Shu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces UNBench, a comprehensive benchmark dataset for evaluating large language models on political science tasks related to UN decision-making, highlighting their potential and limitations in high-stakes political contexts.

Contribution

It presents the first dataset and benchmark specifically designed to assess LLMs' capabilities in modeling UN political decision processes across multiple tasks.

Findings

01

LLMs show promise in understanding UN decision-making tasks

02

Challenges remain in accurately simulating political dynamics

03

Insights into LLM strengths and limitations in political science

Abstract

Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yueqingliang1/unbench
noneOfficial

Videos

Benchmarking LLMs for Political Science: A United Nations Perspective· underline

Taxonomy

TopicsLegal Education and Practice Innovations · Law, AI, and Intellectual Property