Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement   Learning

Tian Xie; Zitian Gao; Qingnan Ren; Haoming Luo; Yuqian Hong; Bryan; Dai; Joey Zhou; Kai Qiu; Zhirong Wu; Chong Luo

arXiv:2502.14768·cs.CL·February 21, 2025·2 cites

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Tian Xie, Zitian Gao, Qingnan Ren, Haoming Luo, Yuqian Hong, Bryan, Dai, Joey Zhou, Kai Qiu, Zhirong Wu, Chong Luo

PDF

Open Access 4 Repos

TL;DR

This paper introduces Logic-RL, a rule-based reinforcement learning approach that enhances large language models' reasoning abilities through synthetic logic puzzles, leading to improved generalization on complex math benchmarks.

Contribution

It presents a novel rule-based RL training method with specific technical strategies, enabling a 7B model to develop advanced reasoning skills and generalize to challenging math tasks.

Findings

01

Model achieves advanced reasoning skills like reflection and verification.

02

Training on 5K logic problems enables generalization to AIME and AMC.

03

Stable RL training is achieved through system prompts and reward design.

Abstract

Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques