Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

Carol Xuan Long; David Simchi-Levi; Feng Zhu; Huangyuan Su; Andre P. Calmon; Flavio P. Calmon

arXiv:2605.17036·cs.AI·May 22, 2026

Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management

Carol Xuan Long, David Simchi-Levi, Feng Zhu, Huangyuan Su, Andre P. Calmon, Flavio P. Calmon

PDF

TL;DR

This paper evaluates autonomous AI agents in supply chain management, revealing their high performance but also significant reliability risks due to decision instability, and proposes a reinforcement learning framework to improve reliability.

Contribution

It introduces the concept of agent bullwhip, analyzes the causes of decision instability, and proposes a GRPO-based reinforcement learning method to enhance autonomous agent reliability.

Findings

01

Out-of-the-box reasoning models outperform humans in supply chain tasks.

02

Decision instability amplifies variability across facilities and over time.

03

Post-training with GRPO reduces tail events and improves agent reliability.

Abstract

This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.