Learning from Algorithm Feedback: One-Shot SAT Solver Guidance with GNNs
Jan T\"onshoff, Martin Grohe

TL;DR
This paper presents a reinforcement learning approach using GNNs to guide SAT solver heuristics, significantly improving solving speed and generalization over traditional methods.
Contribution
Introduces RLAF, a novel framework for training GNN-guided heuristics in SAT solvers using reinforcement learning and a generic variable weighting mechanism.
Findings
Achieves over 2x speedup in SAT solving times.
Generalizes well to larger, more complex problems.
Outperforms supervised heuristic learning methods.
Abstract
Boolean Satisfiability (SAT) solvers are foundational to computer science, yet their performance typically hinges on hand-crafted heuristics. This work introduces Reinforcement Learning from Algorithm Feedback (RLAF) as a paradigm for learning to guide SAT solver branching heuristics with Graph Neural Networks (GNNs). Central to our approach is a novel and generic mechanism for injecting inferred variable weights and polarities into the branching heuristics of existing SAT solvers. In a single forward pass, a GNN assigns these parameters to all variables. Casting this one-shot guidance as a reinforcement learning problem lets us train the GNN with off-the-shelf policy-gradient methods, such as GRPO, directly using the solver's computational cost as the sole reward signal. Extensive evaluations demonstrate that RLAF-trained policies significantly reduce the mean solve times of different…
Peer Reviews
Decision·ICLR 2026 Poster
- The RLAF paradigm is a novel and effective concept. It learns by using the solver's own performance, i.e., its computational cost, as a reward signal. This help in eliminating the need for costly human labelling or hand-curated features (GNN can train itself). - The GNN runs just once per problem to generate a set of hints (weights and polarities) that guide the entire search. It resolves the computational bottleneck of prior RL/GNN methods that required per-decision GNN calls, which often mad
The primary limitations of the work, which the authors are commendably upfront about, relate to the scalability of training and the depth of the methodological analysis. The most significant constraint is the computationally intensive nature of the RL training loop. This currently confines the approach to smaller problem instances, leaving its effectiveness on industrial-scale SAT problems as a crucial direction for future work. Furthermore, the analysis would benefit from comprehensive ablatio
* The use of a NN once before the solver doesn't slow down the solver, and directly relates to SOTA methods for SAT solving. * The article is well written, including discussing the downsides
* Using RL forces the network to be trained only on small SAT instances, so it was not attempted on serious problems.
• The paper is well written. • It investigates how RL-trained Graph Neural Networks (GNNs) can improve the branching heuristics of SAT solvers. • The authors modify existing DPLL-based backtracking SAT solvers to incorporate external variable weights into their branching heuristics. • The results demonstrate that, after training, the learned policies generalize well to significantly larger and more challenging problems. • I particularly appreciate the inclusion of the graph coloring and cryptogr
The work is demonstrated only on small-scale examples, with all training performed on machines equipped with a single multi-core CPU and one GPU. I wish the paper included more numerical experiments on larger-scale problems.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Advanced Multi-Objective Optimization Algorithms · Formal Methods in Verification
MethodsBalanced Selection
