# Validating Generative Agent-Based Models for Logistics and Supply Chain Management Research

**Authors:** Vincent E. Castillo

arXiv: 2508.20234 · 2025-08-29

## TL;DR

This paper evaluates the validity of generative agent-based models powered by large language models in simulating human behaviors for logistics and supply chain management, highlighting their potential and the need for rigorous validation.

## Contribution

It introduces a dual-validation framework for assessing LLM-based GABMs in LSCM, combining human equivalence testing and decision process validation.

## Key findings

- Some LLMs show surface-level behavioral equivalence to humans.
- Structural differences in decision processes are identified in certain LLMs.
- GABMs can be effective with proper validation in LSCM applications.

## Abstract

Generative Agent-Based Models (GABMs) powered by large language models (LLMs) offer promising potential for empirical logistics and supply chain management (LSCM) research by enabling realistic simulation of complex human behaviors. Unlike traditional agent-based models, GABMs generate human-like responses through natural language reasoning, which creates potential for new perspectives on emergent LSCM phenomena. However, the validity of LLMs as proxies for human behavior in LSCM simulations is unknown. This study evaluates LLM equivalence of human behavior through a controlled experiment examining dyadic customer-worker engagements in food delivery scenarios. I test six state-of-the-art LLMs against 957 human participants (477 dyads) using a moderated mediation design. This study reveals a need to validate GABMs on two levels: (1) human equivalence testing, and (2) decision process validation. Results reveal GABMs can effectively simulate human behaviors in LSCM; however, an equivalence-versus-process paradox emerges. While a series of Two One-Sided Tests (TOST) for equivalence reveals some LLMs demonstrate surface-level equivalence to humans, structural equation modeling (SEM) reveals artificial decision processes not present in human participants for some LLMs. These findings show GABMs as a potentially viable methodological instrument in LSCM with proper validation checks. The dual-validation framework also provides LSCM researchers with a guide to rigorous GABM development. For practitioners, this study offers evidence-based assessment for LLM selection for operational tasks.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20234/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20234/full.md

## References

103 references — full list in the complete paper: https://tomesphere.com/paper/2508.20234/full.md

---
Source: https://tomesphere.com/paper/2508.20234