SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning
Huanyu Liu, Ge Li, Jia Li, Hao Zhu, Kechi Zhang, Yihong Dong

TL;DR
Saturn introduces a SAT-based reinforcement learning framework that enables scalable, verifiable, and controllable reasoning task training for large language models, significantly improving their reasoning capabilities across various benchmarks.
Contribution
The paper presents Saturn, a novel SAT-based RL framework for training LLMs with scalable task construction, rule-based verification, and precise difficulty control, advancing reasoning abilities.
Findings
Saturn-1.5B and Saturn-7B outperform previous models on SAT problems.
Significant improvements on math and programming benchmarks.
Achieves +8.8% better performance than state-of-the-art task construction methods.
Abstract
How to design reinforcement learning (RL) tasks that effectively unleash the reasoning capability of large language models (LLMs) remains an open question. Existing RL tasks (e.g., math, programming, and constructing reasoning tasks) suffer from three key limitations: (1) Scalability. They rely heavily on human annotation or expensive LLM synthesis to generate sufficient training data. (2) Verifiability. LLMs' outputs are hard to verify automatically and reliably. (3) Controllable Difficulty. Most tasks lack fine-grained difficulty control, making it hard to train LLMs to develop reasoning ability from easy to hard. To address these limitations, we propose Saturn, a SAT-based RL framework that uses Boolean Satisfiability (SAT) problems to train and evaluate LLMs reasoning. Saturn enables scalable task construction, rule-based verification, and precise difficulty control. Saturn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
