Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

Thomas Hazenberg; Yao Ma; Seyed Sahand Mohammadi Ziabari; and Marijn van Rijswijk

arXiv:2507.02698·cs.LG·July 4, 2025

Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

Thomas Hazenberg, Yao Ma, Seyed Sahand Mohammadi Ziabari, and Marijn van Rijswijk

PDF

TL;DR

This paper evaluates multi-agent reinforcement learning algorithms for dynamic pricing in supply chains, demonstrating how they foster strategic interactions and outperform static rule-based approaches in simulated market environments.

Contribution

It introduces a benchmarking framework for MARL algorithms in supply chain pricing, highlighting their ability to model strategic behaviors absent in traditional static systems.

Findings

01

MADQN shows aggressive pricing with high volatility.

02

MADDPG balances market competition and fairness.

03

Rule-based agents achieve near-perfect fairness and stability.

Abstract

This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains, particularly in contexts where traditional ERP systems rely on static, rule-based approaches that overlook strategic interactions among market actors. While recent research has applied reinforcement learning to pricing, most implementations remain single-agent and fail to model the interdependent nature of real-world supply chains. This study addresses that gap by evaluating the performance of three MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines, within a simulated environment informed by real e-commerce transaction data and a LightGBM demand prediction model. Results show that rule-based agents achieve near-perfect fairness (Jain's Index: 0.9896) and the highest price stability (volatility: 0.024), but they fully lack competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.