Two Headed Dragons: Multimodal Fusion and Cross Modal Transactions
Rupak Bose, Shivam Pande, Biplab Banerjee

TL;DR
This paper introduces a transformer-based fusion method for combining hyperspectral and LiDAR data in remote sensing, enabling cross-modal interactions and improving recognition accuracy.
Contribution
It proposes a novel transformer-based fusion approach with cross key-value pairs and CNNs for effective multimodal data integration in remote sensing.
Findings
Achieved competitive results on Houston and MUUFL Gulfport datasets.
Demonstrated effective cross-modal communication between HSI and LiDAR.
Enhanced recognition performance through multimodal fusion.
Abstract
As the field of remote sensing is evolving, we witness the accumulation of information from several modalities, such as multispectral (MS), hyperspectral (HSI), LiDAR etc. Each of these modalities possess its own distinct characteristics and when combined synergistically, perform very well in the recognition and classification tasks. However, fusing multiple modalities in remote sensing is cumbersome due to highly disparate domains. Furthermore, the existing methods do not facilitate cross-modal interactions. To this end, we propose a novel transformer based fusion method for HSI and LiDAR modalities. The model is composed of stacked auto encoders that harness the cross key-value pairs for HSI and LiDAR, thus establishing a communication between the two modalities, while simultaneously using the CNNs to extract the spectral and spatial information from HSI and LiDAR. We test our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
