# Leveraging potential of limpid attention transformer with dynamic tokenization for hyperspectral image classification

**Authors:** Dhirendra Prasad Yadav, Deepak Kumar, Anand Singh Jalal, Bhisham Sharma, Panos Liatsis, Bardia Yousefi, Bardia Yousefi, Bardia Yousefi

PMC · DOI: 10.1371/journal.pone.0328160 · PLOS One · 2025-08-04

## TL;DR

This paper introduces a new neural network model for classifying hyperspectral images, combining convolutional and attention-based techniques to improve accuracy.

## Contribution

The novel limpid attention block and dynamic tokenization approach enhance spatial-spectral feature correlation with lower computational cost.

## Key findings

- LSANet achieved 98.78% accuracy on the Indian Pines dataset.
- The model outperformed classical CNN and transformer-based methods in hyperspectral image classification.

## Abstract

Hyperspectral data consists of continuous narrow spectral bands. Due to this, it has less spatial and high spectral information. Convolutional neural networks (CNNs) emerge as a highly contextual information model for remote sensing applications. Unfortunately, CNNs have constraints in their underlying network architecture in regards to the global correlation of spatial and spectral features, making them less reliable for mining and representing the sequential properties of spectral signatures. In this article, limpid size attention network (LSANet) is proposed, which contains 3D and 2D convolution blocks for enhancement of spatial-spectral features of the hyperspectral image (HSI). In addition, limpid attention block (LAB) is designed to provide a global correlation of the spectral and spatial features through LS attention. Furthermore, the computational costs of LS-attention are less compared to the multi-head self-attention (MHSA) of the classical vision transformer (ViT). In the ViT encoder a conditional position encoding (CPE) module is utilized that dynamically generates tokens from the feature maps to capture a richer contextual representation. The LSANet obtained overall accuracy (OA) of 98.78%, 98.67%, 97.52% and 89.45%, respectively, on the Indian Pines (IP), Pavia University (PU), Salina Valley (SV) and Botswana datasets. Our model’s quantitative and qualitative results are considerably better than the classical CNN and transformer-based methods.

## Full-text entities

- **Genes:** CPE (carboxypeptidase E) [NCBI Gene 1363] {aka BDVS, CPH, IDDHH}, VIT (vitrin) [NCBI Gene 5212] {aka VIT1}
- **Diseases:** CSIL (MESH:D007859), PU (MESH:C563594), HS (MESH:C567159), cancer (MESH:D009369), HSI (MESH:C564543), SV (MESH:D003047), HSIs (MESH:C536897)
- **Chemicals:** Bitumen (MESH:C006647), IP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12321143/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12321143/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12321143/full.md

---
Source: https://tomesphere.com/paper/PMC12321143