DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding
Zixuan Liu, Siavash H. Khajavi, Guangkai Jiang

TL;DR
DetectiumFire is a large, multi-modal dataset of fire images and videos with detailed annotations, designed to advance fire understanding and safety applications in AI research.
Contribution
We introduce DetectiumFire, a comprehensive fire dataset with high-quality annotations, enabling improved model training for fire detection, reasoning, and synthetic data generation.
Findings
Enhanced performance in object detection tasks
Effective in diffusion-based image generation
Improved vision-language reasoning capabilities
Abstract
Recent advances in multi-modal models have demonstrated strong performance in tasks such as image generation and reasoning. However, applying these models to the fire domain remains challenging due to the lack of publicly available datasets with high-quality fire domain annotations. To address this gap, we introduce DetectiumFire, a large-scale, multi-modal dataset comprising of 22.5k high-resolution fire-related images and 2.5k real-world fire-related videos covering a wide range of fire types, environments, and risk levels. The data are annotated with both traditional computer vision labels (e.g., bounding boxes) and detailed textual prompts describing the scene, enabling applications such as synthetic data generation and fire risk reasoning. DetectiumFire offers clear advantages over existing benchmarks in scale, diversity, and data quality, significantly reducing redundancy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFire Detection and Safety Systems · Multimodal Machine Learning Applications · Fire dynamics and safety research
