CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale

Jonathan Hyun; Nicholas R Waytowich; Boyuan Chen

arXiv:2507.05178·cs.MA·December 15, 2025

CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale

Jonathan Hyun, Nicholas R Waytowich, Boyuan Chen

PDF

Open Access

TL;DR

CREW-Wildfire is a comprehensive benchmark designed to evaluate large-scale, multi-agent AI systems in complex wildfire response scenarios, addressing limitations of existing small-scale, low-complexity environments.

Contribution

It introduces a realistic, scalable wildfire response environment with diverse agents and tasks, enabling assessment of advanced multi-agent coordination and planning capabilities.

Findings

01

State-of-the-art LLM-based frameworks show significant performance gaps.

02

Highlights challenges in large-scale coordination and long-horizon planning.

03

Provides a foundation for future research in scalable multi-agent AI.

Abstract

Despite rapid progress in large language model (LLM)-based multi-agent systems, current benchmarks fall short in evaluating their scalability, robustness, and coordination capabilities in complex, dynamic, real-world tasks. Existing environments typically focus on small-scale, fully observable, or low-complexity domains, limiting their utility for developing and assessing next-generation multi-agent Agentic AI frameworks. We introduce CREW-Wildfire, an open-source benchmark designed to close this gap. Built atop the human-AI teaming CREW simulation platform, CREW-Wildfire offers procedurally generated wildfire response scenarios featuring large maps, heterogeneous agents, partial observability, stochastic dynamics, and long-horizon planning objectives. The environment supports both low-level control and high-level natural language interactions through modular Perception and Execution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Multimodal Machine Learning Applications · Topic Modeling