MARL Warehouse Robots
Price Allman, Lian Thang, Dre Simmons, Salmon Riaz

TL;DR
This study compares MARL algorithms QMIX and IPPO for cooperative warehouse robots, showing QMIX's superior performance but highlighting the need for extensive tuning and scalability challenges in larger deployments.
Contribution
It provides a comparative analysis of MARL algorithms in warehouse robotics, demonstrating QMIX's effectiveness and discussing practical deployment challenges.
Findings
QMIX outperforms IPPO in mean return (3.25 vs. 0.38)
Extensive hyperparameter tuning is required for sparse reward environments
Successful deployment achieved with 1 million training steps
Abstract
We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotics and Sensor-Based Localization · Robot Manipulation and Learning
