L-MCAT: Unpaired Multimodal Transformer with Contrastive Attention for Label-Efficient Satellite Image Classification
Mitul Goswami, Mrinal Goswami

TL;DR
L-MCAT is a transformer-based framework that enables label-efficient satellite image classification by aligning unpaired multimodal data with contrastive attention, achieving high accuracy with minimal labels and computational resources.
Contribution
The paper introduces MSA and U-MAA, novel modules for unpaired multimodal data alignment, reducing label and computational requirements in satellite image classification.
Findings
Achieves 95.4% accuracy with only 20 labels per class.
Uses 47x fewer parameters and 23x fewer FLOPs than state-of-the-art.
Maintains over 92% accuracy under 50% spatial misalignment.
Abstract
We propose the Lightweight Multimodal Contrastive Attention Transformer (L-MCAT), a novel transformer-based framework for label-efficient remote sensing image classification using unpaired multimodal satellite data. L-MCAT introduces two core innovations: (1) Modality-Spectral Adapters (MSA) that compress high-dimensional sensor inputs into a unified embedding space, and (2) Unpaired Multimodal Attention Alignment (U-MAA), a contrastive self-supervised mechanism integrated into the attention layers to align heterogeneous modalities without pixel-level correspondence or labels. L-MCAT achieves 95.4% overall accuracy on the SEN12MS dataset using only 20 labels per class, outperforming state-of-the-art baselines while using 47x fewer parameters and 23x fewer FLOPs than MCTrans. It maintains over 92% accuracy even under 50% spatial misalignment, demonstrating robustness for real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Neural Network Applications · Machine Learning and Data Classification
