Equilibrium Residuals Expose Three Regimes of Matrix-Game Strategic Reasoning in Language Models

Wenhua Nie; Binhan Luo; Zijie Meng; Jyh-Shing Roger Jang; Ching-Wen Ma

arXiv:2605.10410·cs.LG·May 12, 2026

Equilibrium Residuals Expose Three Regimes of Matrix-Game Strategic Reasoning in Language Models

Wenhua Nie, Binhan Luo, Zijie Meng, Jyh-Shing Roger Jang, Ching-Wen Ma

PDF

TL;DR

This paper investigates how large language models understand strategic reasoning in matrix games, revealing different reasoning regimes and the importance of procedural evaluation for measuring true strategic capabilities.

Contribution

It identifies three regimes of strategic reasoning in language models and introduces residuals as a measure of approximate equilibrium computation, with theoretical and experimental insights.

Findings

01

Models perform poorly on anonymous matrix games, dropping to near chance levels.

02

Supervised fine-tuning significantly improves performance on larger, unseen games.

03

Residuals are Lipschitz continuous in payoff perturbations, enabling transferability.

Abstract

Large language models can score well on named game-theory benchmarks while failing on the same strategic computation once semantic cues are removed. We show this gap with procedurally generated zero-sum matrix games: a model that recognizes familiar games drops to 34%, 18%, and 2% success on anonymous $2 \times 2$ , $3 \times 3$ , and $5 \times 5$ payoff matrices. The benchmark separates semantic recall, learned approximate Nash computation, and an output-interface bottleneck that limits scale. Training only on $2 \times 2$ and $3 \times 3$ games, supervised fine-tuning raises unseen $5 \times 5$ -- $7 \times 7$ success from 2% to 61%, while exploitability-reward training averages 37% with high seed variance. We prove that the exploitability residual is $2$ -Lipschitz in payoff perturbations, unlike discontinuous vertex-returning LP equilibrium selectors, explaining why residual training can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.