CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMs

Hongtao Liu; Zhicheng Du; Zihe Wang; Weiran Shen

arXiv:2508.11944·cs.AI·August 19, 2025

CHBench: A Cognitive Hierarchy Benchmark for Evaluating Strategic Reasoning Capability of LLMs

Hongtao Liu, Zhicheng Du, Zihe Wang, Weiran Shen

PDF

Open Access

TL;DR

CHBench introduces a new evaluation framework inspired by behavioral economics to assess the strategic reasoning levels of large language models across various games, revealing the impact of different mechanisms on their reasoning capabilities.

Contribution

This paper presents CHBench, a novel benchmark that systematically evaluates LLMs' strategic reasoning using a cognitive hierarchy model across multiple games.

Findings

01

LLMs show consistent reasoning levels across opponents

02

Chat Mechanism reduces strategic reasoning performance

03

Memory Mechanism improves strategic reasoning

Abstract

Game-playing ability serves as an indicator for evaluating the strategic reasoning capability of large language models (LLMs). While most existing studies rely on utility performance metrics, which are not robust enough due to variations in opponent behavior and game structure. To address this limitation, we propose \textbf{Cognitive Hierarchy Benchmark (CHBench)}, a novel evaluation framework inspired by the cognitive hierarchy models from behavioral economics. We hypothesize that agents have bounded rationality -- different agents behave at varying reasoning depths/levels. We evaluate LLMs' strategic reasoning through a three-phase systematic framework, utilizing behavioral data from six state-of-the-art LLMs across fifteen carefully selected normal-form games. Experiments show that LLMs exhibit consistent strategic reasoning levels across diverse opponents, confirming the framework's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsERP Systems Implementation and Impact · Semantic Web and Ontologies · Big Data and Business Intelligence