Beyond Inference-Time Search: Reinforcement Learning Synthesizes Reusable Solvers
Soheyl Massoudi, Gabriel Apaza, Milad Habibi, and Mark Fuge

TL;DR
This paper demonstrates that reinforcement learning can synthesize reusable solvers for combinatorial optimization problems, reducing inference costs and improving efficiency over traditional sampling methods.
Contribution
It introduces a reinforcement learning approach to generate reusable solvers for problem families, outperforming baseline heuristics and sampling methods in efficiency and accuracy.
Findings
RL-synthesized solvers achieve a 5.0% gap to the Virtual Best Solver.
The learned solver is 91 times cheaper in execution cost than baseline sampling.
The approach transfers to Job Shop Scheduling beyond the original problem domain.
Abstract
Large language models (LLMs) typically approach combinatorial optimization as an inference-time procedure, solving each instance separately through sampling, search, or repeated prompting. We ask whether reinforcement learning can instead shift part of this reasoning cost into the weights of a code LLM, so that the model synthesizes a reusable solver for an entire problem family. We study this question on Synergistic Dependency Selection (SDS), a controlled variant of constrained Quadratic Knapsack designed to expose a specific failure mode: local signals and strict feasibility constraints make greedy heuristics attractive but unreliable. Under identical scaffolding, Best-of-64 base-model sampling saturates at an approximately 28.7% gap to the global Virtual Best Solver (VBS); code audits show that the base model often retrieves Simulated Annealing templates but misimplements the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
