OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

Yitian Chen; Cheng Cheng; Yinan Sun; Zi Ling; Dongdong Ge

arXiv:2601.19924·cs.CL·May 15, 2026

OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling

Yitian Chen, Cheng Cheng, Yinan Sun, Zi Ling, Dongdong Ge

PDF

1 Repo 1 Models 1 Datasets

TL;DR

This paper evaluates the capabilities and limitations of Large Language Models in optimization modeling using a new benchmark framework, revealing key challenges and guiding future research directions.

Contribution

Introduces OPT-ENGINE, a scalable benchmark for LLMs in optimization, and analyzes the effectiveness of reasoning and external tools in problem formulation.

Findings

01

PTR struggles with complex optimization tasks.

02

External tools improve local calculations but not global constraints.

03

Solver-integrated Reasoning faces bottlenecks in constraint formulation.

Abstract

We investigate the capabilities and scalability of Large Language Models (LLMs) in optimization modeling, a domain requiring structured reasoning and precise formulation. To this end, we introduce OPT-ENGINE, an extensible benchmark framework with quantifiable and controllable complexity. OPT-ENGINE spans ten canonical Operations Research problems, systematically scaling from Linear Programming to Mixed-Integer Programming, providing a structured environment to probe the limits of automated problem formulation and solving. Utilizing OPT-Engine, we address three pivotal research questions. First, we examine whether Pure-Text Reasoning (PTR) via classical Chain-of-Thought can efficiently tackle optimization tasks, finding that PTR suffers from a critical robustness gap as task complexity increases. Second, we examine whether integrating external computational tools can mitigate PTR's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cardinal-Operations/OPTEngine
github

Models

🤗
chenyitian-shanshu/Qwen3-SIRL-4B
model· 12 dl
12 dl

Datasets

chenyitian-shanshu/OPTEngine
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.