TL;DR
ReLoop enhances the reliability of LLM-generated optimization code by combining structured reasoning and behavioral verification, significantly reducing errors and improving accuracy across benchmarks.
Contribution
It introduces a novel two-mechanism approach—structured generation and behavioral verification—to address semantic errors in LLM-produced optimization code.
Findings
Structured generation improves accuracy on compositional problems (+8.5pp).
Behavioral verification detects and corrects localized defects (+4.4pp).
ReLoop achieves 100% executable code and improves accuracy across benchmarks.
Abstract
Large language models (LLMs) can translate natural language into optimization code, but silent failures pose a critical risk: code that executes and returns solver-feasible solutions may encode semantically incorrect formulations -- a feasibility-correctness gap reaching 90 percentage points on compositional problems. We introduce ReLoop, which addresses this gap through two complementary mechanisms. Structured generation decomposes code production into a four-stage reasoning chain (understand, formalize, synthesize, verify), preventing formulation errors at their source. Behavioral verification detects errors that survive generation by testing whether the formulation responds correctly to solver-based parameter perturbation -- an external semantic signal that bypasses LLM self-review and requires no ground truth. The two mechanisms are complementary by error structure: structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
