Wildfire Detection Using Vision Transformer with the Wildfire Dataset
Gowtham Raj Vuppari, Navarun Gupta, Ahmed El-Sayed, Xingguo Xiong

TL;DR
This paper explores the use of Vision Transformers trained on a large wildfire image dataset to improve early wildfire detection, addressing challenges like data quality and environmental factors.
Contribution
It introduces a ViT-based approach trained on a comprehensive wildfire dataset, demonstrating its potential for accurate real-time wildfire detection.
Findings
High accuracy achieved in wildfire classification
Effective preprocessing pipeline for high-resolution images
Potential for real-time early wildfire detection
Abstract
The critical need for sophisticated detection techniques has been highlighted by the rising frequency and intensity of wildfires in the US, especially in California. In 2023, wildfires caused 130 deaths nationwide, the highest since 1990. In January 2025, Los Angeles wildfires which included the Palisades and Eaton fires burnt approximately 40,000 acres and 12,000 buildings, and caused loss of human lives. The devastation underscores the urgent need for effective detection and prevention strategies. Deep learning models, such as Vision Transformers (ViTs), can enhance early detection by processing complex image data with high accuracy. However, wildfire detection faces challenges, including the availability of high-quality, real-time data. Wildfires often occur in remote areas with limited sensor coverage, and environmental factors like smoke and cloud cover can hinder detection.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
