OPT-Engine: Benchmarking the Limits of LLMs in Optimization Modeling via Complexity Scaling
Yitian Chen, Cheng Cheng, Yinan Sun, Zi Ling, Dongdong Ge

TL;DR
This paper evaluates the capabilities and limitations of Large Language Models in optimization modeling using a new benchmark framework, revealing key challenges and guiding future research directions.
Contribution
Introduces OPT-ENGINE, a scalable benchmark for LLMs in optimization, and analyzes the effectiveness of reasoning and external tools in problem formulation.
Findings
PTR struggles with complex optimization tasks.
External tools improve local calculations but not global constraints.
Solver-integrated Reasoning faces bottlenecks in constraint formulation.
Abstract
We investigate the capabilities and scalability of Large Language Models (LLMs) in optimization modeling, a domain requiring structured reasoning and precise formulation. To this end, we introduce OPT-ENGINE, an extensible benchmark framework with quantifiable and controllable complexity. OPT-ENGINE spans ten canonical Operations Research problems, systematically scaling from Linear Programming to Mixed-Integer Programming, providing a structured environment to probe the limits of automated problem formulation and solving. Utilizing OPT-Engine, we address three pivotal research questions. First, we examine whether Pure-Text Reasoning (PTR) via classical Chain-of-Thought can efficiently tackle optimization tasks, finding that PTR suffers from a critical robustness gap as task complexity increases. Second, we examine whether integrating external computational tools can mitigate PTR's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
