SegFly: A 2D-3D-2D Paradigm for Aerial RGB-Thermal Semantic Segmentation at Scale
Markus Gross, Sai Bharadhwaj Matha, Rui Song, Viswanathan Muthuveerappan, Conrad Christoph, Julius Huber, Daniel Cremers

TL;DR
SegFly introduces a scalable 2D-3D-2D framework leveraging multi-view redundancy and geometry to automatically generate dense, high-quality RGB and thermal annotations for aerial imagery, enabling large-scale multi-modal semantic segmentation.
Contribution
The paper presents a novel geometry-driven 2D-3D-2D paradigm that automates label propagation and RGB-T alignment, significantly reducing manual effort and expanding the scale of aerial semantic segmentation datasets.
Findings
Automatically generates 97% of RGB labels and 100% of thermal labels with high accuracy.
Achieves 87% registration accuracy for RGB-T alignment without hardware synchronization.
Constructs a large-scale, diverse aerial dataset with over 20,000 images and 15,000 RGB-T pairs.
Abstract
Semantic segmentation for uncrewed aerial vehicles (UAVs) is fundamental for aerial scene understanding, yet existing RGB and RGB-T datasets remain limited in scale, diversity, and annotation efficiency due to the high cost of manual labeling and the difficulties of accurate RGB-T alignment on off-the-shelf UAVs. To address these challenges, we propose a scalable geometry-driven 2D-3D-2D paradigm that leverages multi-view redundancy in high-overlap aerial imagery to automatically propagate labels from a small subset of manually annotated RGB images to both RGB and thermal modalities within a unified framework. By lifting less than 3% of RGB images into a semantic 3D point cloud and reprojecting it into all views, our approach enables dense pseudo ground-truth generation across large image collections, automatically producing 97% of RGB labels and 100% of thermal labels while achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
