Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

Ruicheng Ao; David Simchi-Levi; Xinshang Wang

arXiv:2601.21008·cs.LG·February 10, 2026

Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research

Ruicheng Ao, David Simchi-Levi, Xinshang Wang

PDF

Open Access

TL;DR

This paper introduces two benchmarks that evaluate optimization models within an iterative solver process, emphasizing self-correction and behavioral rationality, and demonstrates improved performance through domain-specific training.

Contribution

It presents novel benchmarks for solver-in-the-loop evaluation in operations research, enabling verifiable feedback and systematic bias measurement, advancing beyond one-shot problem-solving assessments.

Findings

01

Domain-specific RLVR training improves recovery and diagnostic accuracy.

02

Models achieve faster resolution steps with targeted training.

03

Curriculum training reduces systematic bias in OOD scenarios.

Abstract

Operations Research practitioners routinely debug infeasible models through an iterative process: analyzing Irreducible Infeasible Subsystems (\IIS{}), identifying constraint conflicts, and systematically repairing formulations until feasibility is achieved. Yet existing LLM benchmarks evaluate OR as one-shot translation -- given a problem description, generate solver code -- ignoring this diagnostic loop entirely. We introduce two benchmarks that place the \textbf{solver in the evaluation loop}. \textbf{\ORDebug{}} evaluates iterative self-correction through 5,000+ problems spanning 9 error types; each repair action triggers solver re-execution and \IIS{} recomputation, providing deterministic, verifiable feedback. \textbf{\ORBias{}} evaluates behavioral rationality through 2,000 newsvendor instances (1,000 ID + 1,000 OOD), measuring systematic deviations from closed-form optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFormal Methods in Verification · Constraint Satisfaction and Optimization · AI-based Problem Solving and Planning