POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding
Alexey Skrynnik, Anton Andreychuk, Anatolii Borzilov, Alexander, Chernyavskiy, Konstantin Yakovlev, Aleksandr Panov

TL;DR
POGEMA is a comprehensive benchmarking platform designed for fair comparison of classical, learning-based, and hybrid multi-agent pathfinding methods, supporting evaluation, visualization, and standardized metrics.
Contribution
It introduces a unified framework with tools and protocols for benchmarking diverse multi-agent pathfinding approaches in a fair and systematic manner.
Findings
State-of-the-art methods are systematically compared using the platform.
The platform enables fair evaluation across different approaches.
Benchmark results highlight strengths and weaknesses of various methods.
Abstract
Multi-agent reinforcement learning (MARL) has recently excelled in solving challenging cooperative and competitive multi-agent problems in various environments, typically involving a small number of agents and full observability. Moreover, a range of crucial robotics-related tasks, such as multi-robot pathfinding, which have traditionally been approached with classical non-learnable methods (e.g., heuristic search), are now being suggested for solution using learning-based or hybrid methods. However, in this domain, it remains difficult, if not impossible, to conduct a fair comparison between classical, learning-based, and hybrid approaches due to the lack of a unified framework that supports both learning and evaluation. To address this, we introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, a collection of…
Peer Reviews
Decision·ICLR 2025 Poster
The writing of this paper is crisp, and the visualizations are of excellent quality. The authors provide several examples and extensive code. While this is not the first paper to focus on MARL navigation, it is the first to fully focus on MAPF variants in a single repository. The proposed metrics are a much-needed tool to assess the performance of (L)MAPF approaches, not just based on the classical SoC/throughput but also on other metrics that can identify possible research directions.
1. I believe the title and abstract are misleading about the scope of the paper: while POGEMA appeals to a broad audience of MARL-based navigation in title and abstract, in fact it is about two variants of MAPF on discrete grids with simplified settings. For instance, in terms of MAPF, continuous variants like [1] does not appear to be considered. Moreover, there does not seem to be any mention about any-angle versions that would make the problem more realistic and interesting like [2]. 2.
1. The overall platform proposed in this paper is capable of integrating search-based, learning-based, and hybrid approaches together, using the same metric for comparison. 2. As an environment that supports large-scale multi-agent reinforcement learning algorithm training, performing more than 10K steps per second is quite remarkable.
(Major) The primary issue with this paper is that its contributions are not sufficiently prominent. As a benchmark in the multi-agent systems domain, POGEMA's positioning is unclear and lacks irreplaceable advantages. For instance, for a researcher in the MARL domain, it may mainly offer features related to partial observability, rapid training, and scalability to >1000 agents. However, benchmarks already exist in the MARL domain that possess these characteristics, such as [1][2], making this wo
- I like the extensive comparison and differentiation to previous multi-agent environments - I appreciate the large number of evaluated algorithms on the new benchmark - The analysis and experimental studies in this paper are strong - The new environment seems to be in particularly interesting when evaluating a large number of agents in large environments - I like that the authors promise to release the code under an open-access license.
- While I appreciate the computational efficiency and the ability to procedurally generate new environments, I am wondering after reading the paper if the benchmark can further the field by sparking new ideas or raising open problems. The benchmark seems more like an engineering achievement allowing the study of larger agent populations, but I am wondering if this is really _the_ open problem we need to consider in MARL. - Similarly, the observation space seems relatively simple. I may have miss
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Robotic Path Planning Algorithms · Multi-Agent Systems and Negotiation
MethodsSparse Evolutionary Training
