SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining
Nassim Ait Ali Braham, Aaron Banze, Conrad M. Albrecht, Julien Mairal, Jocelyn Chanussot, Xiao Xiang Zhu

TL;DR
SpectralEarth-FM introduces a hierarchical transformer model for joint hyperspectral and multisensor earth observation pretraining, leveraging a large curated dataset to improve performance on downstream EO tasks.
Contribution
The paper presents a novel hierarchical transformer architecture and a large multi-sensor dataset for joint hyperspectral and multisensor earth observation pretraining.
Findings
Achieves state-of-the-art results on hyperspectral downstream tasks.
Demonstrates effective fusion of hyperspectral and multisensor data.
Pretraining improves generalization across diverse EO benchmarks.
Abstract
Earth observation (EO) foundation models (FMs) are increasingly trained on multisensor data, spanning multispectral imagery (MSI), synthetic aperture radar (SAR), and derived geospatial layers, but hyperspectral imagery (HSI) remains underrepresented. Conversely, existing hyperspectral FMs are trained on HSI alone, leaving joint pretraining and fusion of HSI with co-located EO sensors unexplored. We introduce SpectralEarth-FM, a hierarchical transformer for multisensor EO input with heterogeneous spectral dimensionality. The architecture combines spectral tokenization for hyperspectral inputs, sensor-specific encoders, a cross-sensor fusion module, and a shared hierarchical encoder, enabling joint processing of HSI and lower-channel observations. To pretrain SpectralEarth-FM, we curate SpectralEarth-MM, a dataset that co-locates HSI from three spaceborne sensors (EnMAP, EMIT, DESIS)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
