AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager

Xuhua Zhao; Yuxuan Xie; Caihua Chen; Yuxiang Sun

arXiv:2508.11416·cs.AI·August 18, 2025

AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager

Xuhua Zhao, Yuxuan Xie, Caihua Chen, Yuxiang Sun

PDF

TL;DR

This paper introduces AIM-Bench, a benchmark to evaluate decision-making biases of LLM agents in inventory management, revealing biases similar to humans and exploring mitigation strategies for supply chain applications.

Contribution

The paper presents AIM-Bench, the first benchmark for assessing LLM decision biases in supply chain scenarios, and investigates bias mitigation strategies like cognitive reflection and information sharing.

Findings

01

LLMs exhibit human-like decision biases in inventory tasks.

02

Strategies like cognitive reflection reduce biases.

03

Bias mitigation improves decision quality in supply chain contexts.

Abstract

Recent advances in mathematical reasoning and the long-term planning capabilities of large language models (LLMs) have precipitated the development of agents, which are being increasingly leveraged in business operations processes. Decision models to optimize inventory levels are one of the core elements of operations management. However, the capabilities of the LLM agent in making inventory decisions in uncertain contexts, as well as the decision-making biases (e.g. framing effect, etc.) of the agent, remain largely unexplored. This prompts concerns regarding the capacity of LLM agents to effectively address real-world problems, as well as the potential implications of biases that may be present. To address this gap, we introduce AIM-Bench, a novel benchmark designed to assess the decision-making behaviour of LLM agents in uncertain supply chain management scenarios through a diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.