ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification

Mohammed Q. Alkhatib

arXiv:2604.18856·cs.CV·April 22, 2026

ConvVitMamba: Efficient Multiscale Convolution, Transformer, and Mamba-Based Sequence modelling for Hyperspectral Image Classification

Mohammed Q. Alkhatib

PDF

1 Repo

TL;DR

ConvVitMamba is a hybrid deep learning framework that combines multiscale convolution, Vision Transformers, and Mamba-inspired modules to efficiently classify hyperspectral images with high accuracy and reduced computational cost.

Contribution

It introduces a novel unified architecture integrating multiscale convolution, transformer, and Mamba modules for efficient hyperspectral image classification.

Findings

01

Outperforms CNN, Transformer, and Mamba-based methods on benchmark datasets.

02

Achieves a good balance between accuracy, model size, and inference speed.

03

Ablation studies validate the effectiveness of each component.

Abstract

Hyperspectral image (HSI) classification remains challenging due to high spectral dimensionality, redundancy, and limited labeled data. Although convolutional neural networks (CNNs) and Vision Transformers (ViTs) achieve strong performance by exploiting spectral-spatial information and long-range dependencies, they often incur high computational cost and large model size, limiting practical use. To address these limitations, a unified hybrid framework, termed ConvVitMamba, is proposed for efficient HSI classification. The architecture integrates three components: a multiscale convolutional feature extractor to capture local spectral, spatial, and joint patterns; a Vision Transformer based tokenization and encoding stage to model global contextual relationships; and a lightweight Mamba inspired gated sequence mixing module for efficient content-aware refinement without quadratic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mqalkhatib/ConvVitMamba
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.