SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

Huanyu Liu; Ge Li; Jia Li; Hao Zhu; Kechi Zhang; Yihong Dong

arXiv:2505.16368·cs.LG·March 11, 2026

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning

Huanyu Liu, Ge Li, Jia Li, Hao Zhu, Kechi Zhang, Yihong Dong

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

Saturn introduces a SAT-based reinforcement learning framework that enables scalable, verifiable, and controllable reasoning task training for large language models, significantly improving their reasoning capabilities across various benchmarks.

Contribution

The paper presents Saturn, a novel SAT-based RL framework for training LLMs with scalable task construction, rule-based verification, and precise difficulty control, advancing reasoning abilities.

Findings

01

Saturn-1.5B and Saturn-7B outperform previous models on SAT problems.

02

Significant improvements on math and programming benchmarks.

03

Achieves +8.8% better performance than state-of-the-art task construction methods.

Abstract

How to design reinforcement learning (RL) tasks that effectively unleash the reasoning capability of large language models (LLMs) remains an open question. Existing RL tasks (e.g., math, programming, and constructing reasoning tasks) suffer from three key limitations: (1) Scalability. They rely heavily on human annotation or expensive LLM synthesis to generate sufficient training data. (2) Verifiability. LLMs' outputs are hard to verify automatically and reliably. (3) Controllable Difficulty. Most tasks lack fine-grained difficulty control, making it hard to train LLMs to develop reasoning ability from easy to hard. To address these limitations, we propose Saturn, a SAT-based RL framework that uses Boolean Satisfiability (SAT) problems to train and evaluate LLMs reasoning. Saturn enables scalable task construction, rule-based verification, and precise difficulty control. Saturn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gtxygyzb/saturn-code
pytorchOfficial

Models

Videos

SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning· slideslive

Taxonomy

TopicsNatural Language Processing Techniques