Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning
Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan, Dai, Joey Zhou, Kai Qiu, Zhirong Wu, Chong Luo

TL;DR
This paper introduces Logic-RL, a rule-based reinforcement learning approach that enhances large language models' reasoning abilities through synthetic logic puzzles, leading to improved generalization on complex math benchmarks.
Contribution
It presents a novel rule-based RL training method with specific technical strategies, enabling a 7B model to develop advanced reasoning skills and generalize to challenging math tasks.
Findings
Model achieves advanced reasoning skills like reflection and verification.
Training on 5K logic problems enables generalization to AIME and AMC.
Stable RL training is achieved through system prompts and reward design.
Abstract
Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
