WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models

Qiyue Yin; Pei Xu; Qiaozhe Li; Shengda Liu; Shengqi Shen; Tong Wang; Yihong Han; Xiaonan Zhao; Likun Yang; Shiyue Cao; Shiyu Qiu; Yuxuan Liu; Shizhao Yu; Lei Cui; Chengxin Yan; Jie Sun; Xiangquan Tang; Kaiqi Huang

arXiv:2506.10264·cs.AI·June 13, 2025

WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models

Qiyue Yin, Pei Xu, Qiaozhe Li, Shengda Liu, Shengqi Shen, Tong Wang, Yihong Han, Xiaonan Zhao, Likun Yang, Shiyue Cao, Shiyu Qiu, Yuxuan Liu, Shizhao Yu, Lei Cui, Chengxin Yan, Jie Sun, Xiangquan Tang, Kaiqi Huang

PDF

Open Access

TL;DR

WGSR-Bench is a novel benchmark using wargame scenarios to evaluate large language models' strategic reasoning, including multi-agent decision-making and counterfactual reasoning, addressing a critical gap in AI capabilities.

Contribution

This paper introduces WGSR-Bench, the first strategic reasoning benchmark for LLMs based on wargame environments, integrating core tasks for comprehensive assessment.

Findings

01

LLMs show strengths in environmental awareness and opponent modeling.

02

The benchmark reveals limitations in strategic adaptability of current LLMs.

03

WGSR-Bench provides a systematic framework for evaluating multi-agent strategic reasoning.

Abstract

Recent breakthroughs in Large Language Models (LLMs) have led to a qualitative leap in artificial intelligence' s performance on reasoning tasks, particularly demonstrating remarkable capabilities in mathematical, symbolic, and commonsense reasoning. However, as a critical component of advanced human cognition, strategic reasoning, i.e., the ability to assess multi-agent behaviors in dynamic environments, formulate action plans, and adapt strategies, has yet to be systematically evaluated or modeled. To address this gap, this paper introduces WGSR-Bench, the first strategy reasoning benchmark for LLMs using wargame as its evaluation environment. Wargame, a quintessential high-complexity strategic scenario, integrates environmental uncertainty, adversarial dynamics, and non-unique strategic choices, making it an effective testbed for assessing LLMs' capabilities in multi-agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Ethics and Social Impacts of AI