MARL Warehouse Robots

Price Allman; Lian Thang; Dre Simmons; Salmon Riaz

arXiv:2512.04463·cs.AI·December 10, 2025

MARL Warehouse Robots

Price Allman, Lian Thang, Dre Simmons, Salmon Riaz

PDF

Open Access

TL;DR

This study compares MARL algorithms QMIX and IPPO for cooperative warehouse robots, showing QMIX's superior performance but highlighting the need for extensive tuning and scalability challenges in larger deployments.

Contribution

It provides a comparative analysis of MARL algorithms in warehouse robotics, demonstrating QMIX's effectiveness and discussing practical deployment challenges.

Findings

01

QMIX outperforms IPPO in mean return (3.25 vs. 0.38)

02

Extensive hyperparameter tuning is required for sparse reward environments

03

Successful deployment achieved with 1 million training steps

Abstract

We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotics and Sensor-Based Localization · Robot Manipulation and Learning