SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

Shengyue Guan; Yihao Liu; Lang Cao

arXiv:2602.07342·cs.AI·May 14, 2026

SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management

Shengyue Guan, Yihao Liu, Lang Cao

PDF

TL;DR

SupChain-Bench is a comprehensive benchmark designed to evaluate large language models' ability to perform reliable, long-horizon supply chain management tasks grounded in domain-specific procedures.

Contribution

The paper introduces SupChain-Bench, a new benchmark for assessing LLMs in supply chain workflows, and proposes SupChain-ReAct, a framework for autonomous tool-based orchestration.

Findings

01

Substantial gaps in execution reliability across models.

02

SupChain-ReAct achieves the strongest tool-calling performance.

03

Highlights significant room for improvement in LLM-based supply chain agents.

Abstract

Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain workflows require reliable long-horizon, multi-step orchestration grounded in domain-specific procedures, which remains challenging for current models. To systematically evaluate LLM performance in this setting, we introduce SupChain-Bench, a unified real-world benchmark that assesses both supply chain domain knowledge and long-horizon tool-based orchestration grounded in standard operating procedures (SOPs). Our experiments reveal substantial gaps in execution reliability across models. We further propose SupChain-ReAct, an SOP-free framework that autonomously synthesizes executable procedures for tool use, achieving the strongest and most consistent tool-calling performance. Our work establishes a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.