Play to Generalize: Learning to Reason Through Game Play

Yunfei Xie; Yinsong Ma; Shiyi Lan; Alan Yuille; Junfei Xiao; Chen Wei

arXiv:2506.08011·cs.CV·October 10, 2025

Play to Generalize: Learning to Reason Through Game Play

Yunfei Xie, Yinsong Ma, Shiyi Lan, Alan Yuille, Junfei Xiao, Chen Wei

PDF

Open Access 1 Repo 2 Models 3 Reviews

TL;DR

This paper introduces Visual Game Learning (ViGaL), a novel RL-based post-training method where multimodal large language models improve reasoning skills by playing arcade-like games, leading to better performance on diverse reasoning benchmarks.

Contribution

The paper presents a new RL post-training approach using gameplay to enhance reasoning in multimodal models, outperforming specialized models without additional supervised data.

Findings

01

Training on simple arcade games improves multimodal reasoning performance.

02

The model outperforms specialist models on reasoning benchmarks.

03

Gameplay-based training preserves general visual capabilities.

Abstract

Developing reasoning capabilities in multimodal large language models (MLLMs) remains challenging. Motivated by literature suggesting that gameplay promotes transferable reasoning skills, we propose a novel post-training method, Visual Game Learning (ViGaL), where MLLMs develop generalizable reasoning skills through playing arcade-like games. Specifically, we show that training a 7B-parameter MLLM via reinforcement learning (RL) on simple games like Snake significantly enhances the downstream performance on multimodal math benchmarks like MathVista, on multi-discipline questions like MMMU and on 3D spatial reasoning benchmarks like VSI-Bench, without seeing any worked solutions, equations, or diagrams during RL. Remarkably, our model outperforms specialist models post-trained on benchmark-oriented multimodal reasoning data, while preserving the model's performance on general visual…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 3

Strengths

- The paper shows that RL post-training purely on simple visual games (Snake, Rotation) yields measurable gains on other seemingly unrelated domains such as math despite no direct supervision from those tasks. This is a surprising finding and is worth spreading. - The gameplay setup enables verifiable rule-based rewards that are friendly to reasoning training, avoiding the need for expensive reward models or human labels. The fact that it can generalize to other domains shows that it has high po

Weaknesses

* The games are relatively simple, which is understandable though because it is the first effort to explore this direction.

Reviewer 02Rating 6Confidence 4

Strengths

- The idea of using structured games to indirectly train reasoning is novel and very interesting to me. - The performance of using games improves 5–8% on math and spatial tasks, showing gameplay can transfer reasoning skills. - The authors provided careful analysis on reward, prompts, and difficulty to show the effectiveness of each component.

Weaknesses

- It is not very clear how to design a game for different types of reasoning abilities. That being said, although games are useful for post training, it requires design for each task. - The authors are currently only designing two kinds of games with spatial/math-like games used, it is unclear whether other reasoning abilities can also be solved by games. - The paper provides limited analysis of why the game can improve the math reasoning ability. Specifically, what kind of questions in the benc

Reviewer 03Rating 4Confidence 4

Strengths

1) Clean Experimental Setup and Broad Benchmarking: The paper evaluates the proposed method on a wide range of established multimodal reasoning benchmarks (e.g., MathVista, MMMU, CLEVR+), which lends credibility to the reported performance gains. The experimental design is generally sound, and the use of rule-based RL avoids the complexity of reward modeling. 2) Demonstration of Cross-Domain Performance Gain: It is empirically shown that training on a simple game can lead to measurable improvem

Weaknesses

1) Lack of Novelty: Repackaging Known Ideas Without Substantial Advance The idea that training on one task can improve performance on another—i.e., multi-task learning (MTL) or transfer learning—is decades old. The use of pretext tasks in self-supervised learning has been standard in vision and NLP. Even within RL, curriculum learning and autocurricula have long demonstrated that simple environments can give rise to complex behaviors. The paper does not convincingly argue why using Snake as a s

Code & Models

Repositories

yunfeixie233/vigal
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning

MethodsBalanced Selection