Multimodal Fusion Transformer for Remote Sensing Image Classification
Swalpa Kumar Roy, Ankur Deria, Danfeng Hong, Behnood Rasti, Antonio, Plaza, Jocelyn Chanussot

TL;DR
This paper introduces a multimodal fusion transformer that leverages multiple data sources like hyperspectral images and LiDAR for improved land-cover classification, demonstrating superior performance over existing models on benchmark datasets.
Contribution
The paper proposes a novel multimodal fusion transformer with multihead cross patch attention, enhancing generalization in hyperspectral image classification by integrating complementary data sources.
Findings
Outperforms state-of-the-art transformers and CNNs on benchmark datasets
Uses multihead cross patch attention for better feature integration
Achieves superior land-cover classification accuracy
Abstract
Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification. Our mCrossPA utilizes other sources of complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications · Remote-Sensing Image Classification · Neural Networks and Applications
