Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management
Carol Xuan Long, David Simchi-Levi, Feng Zhu, Huangyuan Su, Andre P. Calmon, Flavio P. Calmon

TL;DR
This paper evaluates autonomous AI agents in supply chain management, revealing their high performance but also significant reliability risks due to decision instability, and proposes a reinforcement learning framework to improve reliability.
Contribution
It introduces the concept of agent bullwhip, analyzes the causes of decision instability, and proposes a GRPO-based reinforcement learning method to enhance autonomous agent reliability.
Findings
Out-of-the-box reasoning models outperform humans in supply chain tasks.
Decision instability amplifies variability across facilities and over time.
Post-training with GRPO reduces tail events and improves agent reliability.
Abstract
This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
