HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Di Wang; Meiqi Hu; Yao Jin; Yuchun Miao; Jiaqi Yang; Yichu Xu; Xiaolei; Qin; Jiaqi Ma; Lingyu Sun; Chenxing Li; Chuan Fu; Hongruixuan Chen; Chengxi; Han; Naoto Yokoya; Jing Zhang; Minqiang Xu; Lin Liu; Lefei Zhang; Chen Wu; Bo; Du; Dacheng Tao; Liangpei Zhang

arXiv:2406.11519·cs.CV·April 2, 2025·3 cites

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Di Wang, Meiqi Hu, Yao Jin, Yuchun Miao, Jiaqi Yang, Yichu Xu, Xiaolei, Qin, Jiaqi Ma, Lingyu Sun, Chenxing Li, Chuan Fu, Hongruixuan Chen, Chengxi, Han, Naoto Yokoya, Jing Zhang, Minqiang Xu, Lin Liu, Lefei Zhang, Chen Wu, Bo, Du, Dacheng Tao, Liangpei Zhang

PDF

Open Access 1 Repo

TL;DR

HyperSIGMA is a scalable, transformer-based foundation model for hyperspectral image interpretation that unifies multiple tasks and scenes, leveraging a novel sparse sampling attention mechanism and a large-scale dataset for superior performance.

Contribution

The paper introduces HyperSIGMA, a vision transformer model with a novel SSA mechanism and a large hyperspectral dataset, enabling cross-task and cross-scene HSI interpretation.

Findings

01

HyperSIGMA outperforms state-of-the-art methods across various HSI tasks.

02

The SSA mechanism effectively captures diverse contextual features.

03

HyperSIGMA demonstrates high scalability, robustness, and real-world applicability.

Abstract

Accurate hyperspectral image (HSI) interpretation is critical for providing valuable insights into various earth observation-related applications such as urban planning, precision agriculture, and environmental monitoring. However, existing HSI processing methods are predominantly task-specific and scene-dependent, which severely limits their ability to transfer knowledge across tasks and scenes, thereby reducing the practicality in real-world applications. To address these challenges, we present HyperSIGMA, a vision transformer-based foundation model that unifies HSI interpretation across tasks and scenes, scalable to over one billion parameters. To overcome the spectral and spatial redundancy inherent in HSIs, we introduce a novel sparse sampling attention (SSA) mechanism, which effectively promotes the learning of diverse contextual features and serves as the basic block of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

whu-sigma/hypersigma
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification

MethodsSoftmax · Attention Is All You Need