OR-R1: Automating Modeling and Solving of Operations Research Optimization Problem via Test-Time Reinforcement Learning
Zezhen Ding, Zhen Tan, Jiheng Zhang, Tianlong Chen

TL;DR
OR-R1 introduces a data-efficient, two-stage framework leveraging supervised fine-tuning and test-time optimization to automate and improve the modeling and solving of operations research problems, reducing data requirements and enhancing accuracy.
Contribution
The paper presents OR-R1, a novel framework combining supervised fine-tuning and test-time optimization for efficient, scalable OR problem modeling and solving with limited data.
Findings
Achieves 67.7% accuracy with only 1/10 synthetic data of prior methods.
Outperforms previous methods by up to 4.2% in accuracy.
Test-Time Group Relative Policy Optimization improves accuracy by 3.1%-6.4%.
Abstract
Optimization modeling and solving are fundamental to the application of Operations Research (OR) in real-world decision making, yet the process of translating natural language problem descriptions into formal models and solver code remains highly expertise intensive. While recent advances in large language models (LLMs) have opened new opportunities for automation, the generalization ability and data efficiency of existing LLM-based methods are still limited, asmost require vast amounts of annotated or synthetic data, resulting in high costs and scalability barriers. In this work, we present OR-R1, a data-efficient training framework for automated optimization modeling and solving. OR-R1 first employs supervised fine-tuning (SFT) to help the model acquire the essential reasoning patterns for problem formulation and code generation from limited labeled data. In addition, it improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Machine Learning and Data Classification · Constraint Satisfaction and Optimization
