VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

Shengding Liu; Qiben Yan

arXiv:2604.12831·cs.RO·April 15, 2026

VULCAN: Vision-Language-Model Enhanced Multi-Agent Cooperative Navigation for Indoor Fire-Disaster Response

Shengding Liu, Qiben Yan

PDF

TL;DR

VULCAN introduces a multi-agent navigation framework leveraging multi-modal perception and vision-language models to improve indoor fire disaster response, addressing challenges like smoke and thermal hazards.

Contribution

The paper develops VULCAN, a novel multi-agent system with enhanced perception for fire scenarios, extending the Habitat-Matterport3D benchmark with realistic fire simulations.

Findings

01

Existing methods fail under fire conditions due to perception issues.

02

VULCAN demonstrates improved robustness in simulated fire environments.

03

Highlighting the importance of hazard-aware planning for rescue missions.

Abstract

Indoor fire disasters pose severe challenges to autonomous search and rescue due to dense smoke, high temperatures, and dynamically evolving indoor environments. In such time-critical scenarios, multi-agent cooperative navigation is particularly useful, as it enables faster and broader exploration than single-agent approaches. However, existing multi-agent navigation systems are primarily vision-based and designed for benign indoor settings, leading to significant performance degradation under fire-driven dynamic conditions. In this paper, we present VULCAN, a multi-agent cooperative navigation framework based on multi-modal perception and vision-language models (VLMs), tailored for indoor fire disaster response. We extend the Habitat-Matterport3D benchmark by simulating physically realistic fire scenarios, including smoke diffusion, thermal hazards, and sensor degradation. We evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.