LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift
Haozhe Si, Yuxuan Wan, Yuqing Wang, Minh Do, Han Zhao

TL;DR
This paper introduces LESSViT, a low-rank, sensor-flexible Vision Transformer architecture for hyperspectral imagery that improves cross-sensor generalization by efficiently modeling spatial-spectral interactions.
Contribution
The paper proposes LESSViT with LESS Attention and HyperMAE for robust, efficient hyperspectral representation learning across different sensors and spectral configurations.
Findings
LESSViT enhances robustness under spectral shifts on the SpectralEarth benchmark.
The low-rank factorization reduces computational complexity significantly.
Explicit spatial-spectral modeling is crucial for scalable hyperspectral learning.
Abstract
Modeling hyperspectral imagery (HSI) across different sensors presents a fundamental challenge due to variations in wavelength coverage, band sampling, and channel dimensionality. As a result, models trained under a fixed spectral configuration often fail to generalize to other sensors. Existing Vision Transformer (ViT) approaches either rely on implicit spectral modeling with fixed channel assumptions or adopt explicit spatial-spectral attention with prohibitive computational cost, leading to a fundamental trade-off between efficiency and expressiveness. In this work, we introduce Low-rank Efficient Spatial-Spectral ViT (LESSViT), a sensor-flexible architecture for cross-spectral generalization. LESSViT is built on LESS Attention, a structured low-rank factorization that models joint spatial-spectral interactions through separable spatial and spectral components, reducing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
