Multimodal Informative ViT: Information Aggregation and Distribution for   Hyperspectral and LiDAR Classification

Jiaqing Zhang; Jie Lei; Weiying Xie; Geng Yang; Daixun Li; Yunsong Li

arXiv:2401.03179·cs.CV·January 24, 2024·1 cites

Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification

Jiaqing Zhang, Jie Lei, Weiying Xie, Geng Yang, Daixun Li, Yunsong Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces MIVit, a novel multimodal Transformer-based system that reduces redundancy and enhances feature integration in hyperspectral and LiDAR land cover classification, achieving state-of-the-art accuracy.

Contribution

The paper presents a new information aggregation and distribution mechanism within a Transformer framework for improved multimodal land cover classification.

Findings

01

Achieves 95.56% accuracy on three datasets.

02

Outperforms existing state-of-the-art methods.

03

Effectively reduces redundancy in multimodal features.

Abstract

In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundancy levels and integrates performance-aware elements into the fused representation, facilitating the learning of semantics in both forward and backward directions. MIVit stands out by significantly reducing redundancy in the empirical distribution of each modality's separate and fused features. It employs oriented attention fusion (OAF) for extracting shallow local features across modalities in horizontal and vertical dimensions, and a Transformer feature extractor for extracting deep global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

icey-zhang/MIViT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Geographic Information Systems Studies · Image Retrieval and Classification Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Adam · Layer Normalization · Residual Connection · Absolute Position Encodings · Dense Connections · Position-Wise Feed-Forward Layer