Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Xiachong Feng; Deyi Yin; Xiaocheng Feng; Yi Jiang; Libo Qin; Yangfan Ye; Lei Huang; Weitao Ma; Qiming Li; Yuxuan Gu; Bing Qin; Lingpeng Kong

arXiv:2604.17696·cs.AI·April 21, 2026

Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play

Xiachong Feng, Deyi Yin, Xiaocheng Feng, Yi Jiang, Libo Qin, Yangfan Ye, Lei Huang, Weitao Ma, Qiming Li, Yuxuan Gu, Bing Qin, Lingpeng Kong

PDF

1 Repo

TL;DR

Stratagem introduces a novel training method for language models that enhances their ability to transfer reasoning skills across different domains by focusing on trajectory-based reinforcement and adaptive reasoning incentives.

Contribution

The paper proposes a new approach called STRATAGEM that improves reasoning transfer in language models by reinforcing domain-agnostic reasoning patterns and encouraging reasoning evolution.

Findings

01

Significant improvements in mathematical reasoning benchmarks.

02

Enhanced general reasoning and code generation performance.

03

Ablation studies confirm the effectiveness of both proposed components.

Abstract

Games offer a compelling paradigm for developing general reasoning capabilities in language models, as they naturally demand strategic planning, probabilistic inference, and adaptive decision-making. However, existing self-play approaches rely solely on terminal game outcomes, providing no mechanism to distinguish transferable reasoning patterns from game-specific heuristics. We present STRATAGEM, which addresses two fundamental barriers to reasoning transfer: domain specificity, where learned patterns remain anchored in game semantics, and contextual stasis, where static game contexts fail to cultivate progressive reasoning. STRATAGEM selectively reinforces trajectories exhibiting abstract, domain-agnostic reasoning through a Reasoning Transferability Coefficient, while incentivizing adaptive reasoning development via a Reasoning Evolution Reward. Experiments across mathematical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ydyyyy/Stratagem
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.