ORThought: Benchmarking and Automating Logistics Optimization Modeling

Beinuo Yang; Qishen Zhou; Junyi Li; Chenxing Su; Panagiotis Angeloudis; Simon Hu

arXiv:2508.14410·cs.AI·April 21, 2026

ORThought: Benchmarking and Automating Logistics Optimization Modeling

Beinuo Yang, Qishen Zhou, Junyi Li, Chenxing Su, Panagiotis Angeloudis, Simon Hu

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces LogiOR, a comprehensive logistics benchmark, and ORThought, a dual-agent framework utilizing chain-of-thought reasoning, to improve automation and performance in logistics optimization modeling.

Contribution

It presents a new benchmark dataset and a structured dual-agent framework that outperforms existing methods in logistics optimization tasks.

Findings

01

ORThought outperforms state-of-the-art baselines by 9-17 percentage points.

02

The framework handles complex constraints effectively.

03

Error analysis reveals key failure modes and success factors.

Abstract

Optimization modeling stands as the engine of scientific decision-making in logistics and transportation, yet its adoption is hindered by a steep expertise threshold and the latency of manual workflows. Automating this process via Large Language Models (LLMs) offers a potential solution, but current approaches face critical bottlenecks: (i) a lack of high-quality, complex benchmarks; (ii) methodological inefficiencies in autonomous multi-agent frameworks, which often exhibit instability and redundant computation; and (iii) evaluations that lack diagnostic depth. In this work, we address these challenges from the following three aspects. First, we introduce LogiOR, a diverse logistics benchmark with rigorous annotations, and enrich existing datasets with the same annotation standard to support community utilization. Second, we propose ORThought, a structured dual-agent framework. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZJU-TSELab/ORThought
github

Datasets

LabMem012/LogiOR
dataset· 227 dl
227 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.