Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification
Jiaqing Zhang, Jie Lei, Weiying Xie, Geng Yang, Daixun Li, Yunsong Li

TL;DR
This paper introduces MIVit, a novel multimodal Transformer-based system that reduces redundancy and enhances feature integration in hyperspectral and LiDAR land cover classification, achieving state-of-the-art accuracy.
Contribution
The paper presents a new information aggregation and distribution mechanism within a Transformer framework for improved multimodal land cover classification.
Findings
Achieves 95.56% accuracy on three datasets.
Outperforms existing state-of-the-art methods.
Effectively reduces redundancy in multimodal features.
Abstract
In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundancy levels and integrates performance-aware elements into the fused representation, facilitating the learning of semantics in both forward and backward directions. MIVit stands out by significantly reducing redundancy in the empirical distribution of each modality's separate and fused features. It employs oriented attention fusion (OAF) for extracting shallow local features across modalities in horizontal and vertical dimensions, and a Transformer feature extractor for extracting deep global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Geographic Information Systems Studies · Image Retrieval and Classification Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Layer Normalization · Residual Connection · Absolute Position Encodings · Dense Connections · Position-Wise Feed-Forward Layer
