ORThought: Benchmarking and Automating Logistics Optimization Modeling
Beinuo Yang, Qishen Zhou, Junyi Li, Chenxing Su, Panagiotis Angeloudis, Simon Hu

TL;DR
This paper introduces LogiOR, a comprehensive logistics benchmark, and ORThought, a dual-agent framework utilizing chain-of-thought reasoning, to improve automation and performance in logistics optimization modeling.
Contribution
It presents a new benchmark dataset and a structured dual-agent framework that outperforms existing methods in logistics optimization tasks.
Findings
ORThought outperforms state-of-the-art baselines by 9-17 percentage points.
The framework handles complex constraints effectively.
Error analysis reveals key failure modes and success factors.
Abstract
Optimization modeling stands as the engine of scientific decision-making in logistics and transportation, yet its adoption is hindered by a steep expertise threshold and the latency of manual workflows. Automating this process via Large Language Models (LLMs) offers a potential solution, but current approaches face critical bottlenecks: (i) a lack of high-quality, complex benchmarks; (ii) methodological inefficiencies in autonomous multi-agent frameworks, which often exhibit instability and redundant computation; and (iii) evaluations that lack diagnostic depth. In this work, we address these challenges from the following three aspects. First, we introduce LogiOR, a diverse logistics benchmark with rigorous annotations, and enrich existing datasets with the same annotation standard to support community utilization. Second, we propose ORThought, a structured dual-agent framework. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
