Multimodal Fusion Strategies for Mapping Biophysical Landscape Features
Lucia Gordon, Nico Lang, Catherine Ressijac, Andrew Davies

TL;DR
This study compares three multimodal fusion strategies in deep learning models to classify landscape features in aerial imagery, revealing that different methods excel at detecting specific features in African savanna ecosystems.
Contribution
It systematically evaluates early, late, and mixture of experts fusion strategies for combining thermal, RGB, and LiDAR data in ecological mapping.
Findings
Late fusion achieves an AUC of 0.698 overall.
Early fusion has the best recall for middens and water.
Mixture of Experts best recall for termite mounds.
Abstract
Multimodal aerial data are used to monitor natural systems, and machine learning can significantly accelerate the classification of landscape features within such imagery to benefit ecology and conservation. It remains under-explored, however, how these multiple modalities ought to be fused in a deep learning model. As a step towards filling this gap, we study three strategies (Early fusion, Late fusion, and Mixture of Experts) for fusing thermal, RGB, and LiDAR imagery using a dataset of spatially-aligned orthomosaics in these three modalities. In particular, we aim to map three ecologically-relevant biophysical landscape features in African savanna ecosystems: rhino middens, termite mounds, and water. The three fusion strategies differ in whether the modalities are fused early or late, and if late, whether the model learns fixed weights per modality for each class or generates weights…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and Land Use · Remote Sensing and LiDAR Applications · Remote-Sensing Image Classification
