TL;DR
Stratagem introduces a novel training method for language models that enhances their ability to transfer reasoning skills across different domains by focusing on trajectory-based reinforcement and adaptive reasoning incentives.
Contribution
The paper proposes a new approach called STRATAGEM that improves reasoning transfer in language models by reinforcing domain-agnostic reasoning patterns and encouraging reasoning evolution.
Findings
Significant improvements in mathematical reasoning benchmarks.
Enhanced general reasoning and code generation performance.
Ablation studies confirm the effectiveness of both proposed components.
Abstract
Games offer a compelling paradigm for developing general reasoning capabilities in language models, as they naturally demand strategic planning, probabilistic inference, and adaptive decision-making. However, existing self-play approaches rely solely on terminal game outcomes, providing no mechanism to distinguish transferable reasoning patterns from game-specific heuristics. We present STRATAGEM, which addresses two fundamental barriers to reasoning transfer: domain specificity, where learned patterns remain anchored in game semantics, and contextual stasis, where static game contexts fail to cultivate progressive reasoning. STRATAGEM selectively reinforces trajectories exhibiting abstract, domain-agnostic reasoning through a Reasoning Transferability Coefficient, while incentivizing adaptive reasoning development via a Reasoning Evolution Reward. Experiments across mathematical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
