Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research
Ruicheng Ao, David Simchi-Levi, Xinshang Wang

TL;DR
This paper introduces two benchmarks that evaluate optimization models within an iterative solver process, emphasizing self-correction and behavioral rationality, and demonstrates improved performance through domain-specific training.
Contribution
It presents novel benchmarks for solver-in-the-loop evaluation in operations research, enabling verifiable feedback and systematic bias measurement, advancing beyond one-shot problem-solving assessments.
Findings
Domain-specific RLVR training improves recovery and diagnostic accuracy.
Models achieve faster resolution steps with targeted training.
Curriculum training reduces systematic bias in OOD scenarios.
Abstract
Operations Research practitioners routinely debug infeasible models through an iterative process: analyzing Irreducible Infeasible Subsystems (\IIS{}), identifying constraint conflicts, and systematically repairing formulations until feasibility is achieved. Yet existing LLM benchmarks evaluate OR as one-shot translation -- given a problem description, generate solver code -- ignoring this diagnostic loop entirely. We introduce two benchmarks that place the \textbf{solver in the evaluation loop}. \textbf{\ORDebug{}} evaluates iterative self-correction through 5,000+ problems spanning 9 error types; each repair action triggers solver re-execution and \IIS{} recomputation, providing deterministic, verifiable feedback. \textbf{\ORBias{}} evaluates behavioral rationality through 2,000 newsvendor instances (1,000 ID + 1,000 OOD), measuring systematic deviations from closed-form optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFormal Methods in Verification · Constraint Satisfaction and Optimization · AI-based Problem Solving and Planning
