Execution-Verified Reinforcement Learning for Optimization Modeling

Runda Guan; Xiangqing Shen; Jiajun Zhang; Yifan Zhang; Jian Cheng; Rui Xia

arXiv:2604.00442·cs.AI·April 2, 2026

Execution-Verified Reinforcement Learning for Optimization Modeling

Runda Guan, Xiangqing Shen, Jiajun Zhang, Yifan Zhang, Jian Cheng, Rui Xia

PDF

TL;DR

EVOM introduces an execution-verified reinforcement learning framework for optimization modeling that leverages solver outcomes as scalar rewards, enabling scalable, solver-agnostic decision intelligence.

Contribution

The paper presents EVOM, a novel framework that uses execution outcomes as rewards, removing supervision needs and enabling cross-solver generalization in optimization modeling.

Findings

01

EVOM matches or outperforms supervised fine-tuning methods.

02

Supports zero-shot transfer across different solvers.

03

Achieves effective low-cost adaptation by continued training with target solvers.

Abstract

Automating optimization modeling with LLMs is a promising path toward scalable decision intelligence, but existing approaches either rely on agentic pipelines built on closed-source LLMs with high inference latency, or fine-tune smaller LLMs using costly process supervision that often overfits to a single solver API. Inspired by reinforcement learning with verifiable rewards, we propose Execution-Verified Optimization Modeling (EVOM), an execution-verified learning framework that treats a mathematical programming solver as a deterministic, interactive verifier. Given a natural-language problem and a target solver, EVOM generates solver-specific code, executes it in a sandboxed harness, and converts execution outcomes into scalar rewards, optimized with GRPO and DAPO in a closed-loop generate-execute-feedback-update process. This outcome-only formulation removes the need for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.