LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning

Zhen Hao Wong; Jingwen Deng; Runming He; Zirong Chen; Qijie You; Hejun Dong; Hao Liang; Chengyu Shen; Bin Cui; Wentao Zhang

arXiv:2506.04821·cs.LG·June 6, 2025

LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning

Zhen Hao Wong, Jingwen Deng, Runming He, Zirong Chen, Qijie You, Hejun Dong, Hao Liang, Chengyu Shen, Bin Cui, Wentao Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reinforcement learning framework that fine-tunes large language models using logic puzzles to enhance their general reasoning abilities and improve performance on diverse mathematical tasks.

Contribution

It presents a novel 'play to learn' approach that uses custom logic puzzles and reinforcement learning to cultivate transferable reasoning skills in LLMs, surpassing traditional fine-tuning methods.

Findings

01

Improved out-of-distribution performance on mathematical benchmarks.

02

Enhanced reasoning in algebra, geometry, and combinatorics.

03

Limited gains on rote or highly specialized tasks.

Abstract

Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning in unfamiliar settings. This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specific heuristics rather than fostering general-purpose thinking strategies. In this work, we propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles, each designed to cultivate distinct reasoning skills such as constraint propagation, spatial consistency, and symbolic deduction. Using a reinforcement learning setup with verifiable rewards, models receive binary feedback based on puzzle correctness, encouraging iterative, hypothesis-driven problem solving. We demonstrate that this training approach significantly improves out-of-distribution performance on a range of mathematical benchmarks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wongzhenhao/GameRL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics Education and Teaching Techniques · Cognitive and developmental aspects of mathematical skills · Machine Learning in Materials Science