TL;DR
This paper introduces a novel CNN-Transformer hybrid network with pooling attention fusion for hyperspectral image classification, effectively capturing spatial and spectral features while preserving information across layers.
Contribution
It proposes a synergistic CNN-Transformer architecture with novel modules for joint spatial-spectral feature extraction and information preservation, outperforming existing methods.
Findings
The proposed method achieves superior accuracy on multiple hyperspectral datasets.
Extensive experiments validate the effectiveness of the pooling attention fusion approach.
The code is publicly available at the provided GitHub repository.
Abstract
In the hyperspectral image (HSI) classification task, each pixel is categorized into a specific land-cover category or material. Convolutional neural networks (CNNs) and transformers have been widely used to extract local and non-local features in HSI classification. Recent works have utilized a multi-scale vision transformer (ViT) to enhance spectral feature capture and yield promising results. However, most existing methods still face challenges in the effective joint use of spatial-spectral information and in preserving information across layers during the propagation process. To address these issues, we propose a synergistic CNN-Transformer network with pooling attention fusion for HSI classification, which collaboratively utilizes CNNs and ViT to process spatial and spectral features separately. Specifically, we propose a Twin-Branch Feature Extraction (TBFE) module, which employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
