AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library
Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Shenhao Wang, Haris Koutsopoulos, Hai Wang, Cathy Wu, Jinhua Zhao

TL;DR
AlphaOPT introduces a self-improving experience library for LLMs that enhances optimization modeling by learning from solver-verified insights, enabling better generalization and transfer without retraining.
Contribution
It presents a novel two-phase framework for LLMs to learn, refine, and reuse optimization modeling knowledge from limited supervision and solver feedback.
Findings
AlphaOPT improves accuracy from 65% to 72% with more data.
It outperforms baselines by over 8% on out-of-distribution datasets.
Structured experience learning enhances reasoning without retraining.
Abstract
Optimization modeling underlies critical decision-making across industries, yet remains difficult to automate: natural-language problem descriptions must be translated into precise mathematical formulations and executable solver code. Existing LLM-based approaches typically rely on brittle prompting or costly retraining, both of which offer limited generalization. Recent work suggests that large models can improve via experience reuse, but how to systematically acquire, refine, and reuse such experience in structurally constrained settings remains unclear. We present \textbf{AlphaOPT}, a self-improving experience library that enables LLMs to learn optimization modeling knowledge from limited supervision, including answer-only feedback without gold-standard programs, annotated reasoning traces, or parameter updates. AlphaOPT operates in a continual two-phase cycle: a \emph{Library…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
Originality and Quality: While the high-level 'learn-from-experience' pipeline is conceptually familiar, the paper's originality stems from its rigorous, domain-specific adaptation to optimization modeling. The design of the structured 4-tuple insight and Library Evolution is well-suited for this domain. Significance: The ability to learn effectively from 'answer-only' supervision is highly significant. This directly addresses a critical bottleneck in the field, as gold-standard formulation pr
Unfair Experimental Comparison (Base Model Disparity): The primary weakness lies in the experimental comparisons in Figure 4 and Table 2. The paper positions AlphaOPT as a framework, but its performance is inherently tied to the underlying base LLM. Figure 1's reference to GPT-4o suggests AlphaOPT utilizes a state-of-the-art proprietary model. This model is orders of magnitude larger and more capable than the baselines used for comparison (i.e., LLaMa3-8B for ORLM and Qwen2.5-14B for LLMOPT). Th
1. Works on the relatively less-explored optimization problems with LLMs. 2. Evidence showing that building an experience library helps the accuracy over time.
1. The math in 3.2 lacks information and proof in Appendix D contains too strong assumptions that makes the Theorem 1 meaninglessly trivial. The assumption 2 is far from being realistic, and cannot be guaranteed in the infinite case in Theorem 2. 1. Any evaluation or explanation on why applying LLMs to optimization problems brings benefits compared to using readily available tools? Is there a need for automating this? Otherwise, in what situations would people want to use LLMs? This question is
1) Good motivation. Learning from failed samples generated by the model during scaled inference, in a continuous way, is a good strategy to enhance accuracy. 2) Answers-only learning works and transfers OOD. The method operates well without gold programs (which are hard to acquire for new domains); learning solely from answers (self-explore) yields accuracy comparable to full supervision. 3) Continual macro gains with compact growth. From 100 to 300 training items, performance rises while th
1) Runtime/latency and token accounting are absent. The paper does not report retrieval or inference latency and token/cost budget, making it hard to assess practicality and to compare cost with alternative agentic/test-time scaling methods. 2) Reliance on solver feedback restricts applicability. Learning and verification hinge on a solver producing/validating optimal values; many real-world reasoning tasks lack such verifiers, limiting external validity beyond domains with executable solvers
Problem significance and framing: The work addresses a major bottleneck in applying LLMs to operations research and mathematical optimization—translating informal language into executable solver programs. The authors clearly articulate why both prompt-based (fragile templates) and fine-tuned (data-hungry, low-transfer) systems fail to generalize, motivating a self-improving paradigm through experience collection. Empirical results and generalization: Table 2 and Fig. 4 show strong out-of-distri
Novelty concern: This is a very important concern. AlphaOPT’s framing, i.e., “self-improving experience library” for optimization modeling, sounds conceptually new, but the underlying mechanics draw heavily from existing reflection and retrieval-based self-improvement paradigms, such as Reflexion (Shinn et al., 2023), AlphaEvolve / ReEvo (Novikov et al., 2025; Ye et al., 2024), Voyager (Wang et al, 2023). OPTIMIZATION PERSPECTIVE (sec 3.2) is also similar to a very recent work that uses a maximu
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Machine Learning in Materials Science · Advanced Multi-Objective Optimization Algorithms
