TL;DR
This paper introduces CBC-SLP, a novel multimodal semantic segmentation model that preserves both shared and modality-specific information, improving robustness and performance in scenarios with missing or full modalities.
Contribution
The paper proposes a structured latent projection approach integrated into the architecture to enhance multimodal segmentation, especially under modality dropout conditions.
Findings
CBC-SLP outperforms state-of-the-art models in full and missing modality scenarios.
The approach effectively recovers complementary information lost in shared representations.
Experiments on remote sensing datasets validate the robustness and effectiveness of the method.
Abstract
Multimodal remote sensing data provide complementary information for semantic segmentation, but in real-world deployments, some modalities may be unavailable due to sensor failures, acquisition issues, or challenging atmospheric conditions. Existing multimodal segmentation models typically address missing modalities by learning a shared representation across inputs. However, this approach can introduce a trade-off by compromising modality-specific complementary information and reducing performance when all modalities are available. In this paper, we tackle this limitation with CBC-SLP, a multimodal semantic segmentation model designed to preserve both modality-invariant and modality-specific information. Inspired by the theoretical results on modality alignment, which state that perfectly aligned multimodal representations can lead to sub-optimal performance in downstream prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
