TL;DR
This paper introduces a novel training strategy for multimodal segmentation that leverages pretrained latent space to guide scenario sampling, improving performance under missing modalities.
Contribution
It proposes a scenario sampling method based on latent space distortion, enhancing fine-tuning for missing modality scenarios in remote sensing segmentation.
Findings
Outperforms standard fine-tuning and LoRA-based adaptation.
Effective across multiple remote sensing datasets and backbones.
Utilizes pretrained latent space for more informative scenario sampling.
Abstract
Multimodal semantic segmentation benefits remote sensing analysis by combining complementary information from different sensor modalities. In real-world remote sensing applications, one or more modalities may be unavailable due to sensor failures, adverse atmospheric conditions, or data acquisition problems. Even with pretrained multimodal representations and existing fine-tuning or adaptation strategies, performance may remain limited because all modality availability scenarios are typically treated as equally informative during training. In this paper, we propose a novel training strategy that learns a scenario sampling distribution directly from the pretrained latent space. Instead of relying on uniform random modality dropout, the proposed method guides fine-tuning toward more informative modality availability scenarios. More specifically, we quantify the effect of each scenario…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
