Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery

Minh Kha Do; Wei Xiang; Kang Han; Di Wu; Khoa Phan; Yi-Ping Phoebe Chen; Gaowen Liu; Ramana Rao Kompella

arXiv:2602.22613·cs.CV·February 27, 2026

Spectrally Distilled Representations Aligned with Instruction-Augmented LLMs for Satellite Imagery

Minh Kha Do, Wei Xiang, Kang Han, Di Wu, Khoa Phan, Yi-Ping Phoebe Chen, Gaowen Liu, Ramana Rao Kompella

PDF

Open Access 1 Models

TL;DR

SATtxt introduces a spectrum-aware vision-language model for satellite imagery that learns spectral cues during training and operates with RGB inputs at inference, enhancing zero-shot classification and retrieval performance.

Contribution

It proposes a novel two-stage framework combining spectral representation distillation and spectrally grounded alignment with instruction-augmented LLMs for satellite imagery.

Findings

01

Improves zero-shot classification by 4.2% on average

02

Enhances retrieval accuracy by 5.9%

03

Boosts linear probing performance by 2.7%

Abstract

Vision-language foundation models (VLFMs) promise zero-shot and retrieval understanding for Earth observation. While operational satellite systems often lack full multi-spectral coverage, making RGB-only inference highly desirable for scalable deployment, the adoption of VLFMs for satellite imagery remains hindered by two factors: (1) multi-spectral inputs are informative but difficult to exploit consistently due to band redundancy and misalignment; and (2) CLIP-style text encoders limit semantic expressiveness and weaken fine-grained alignment. We present SATtxt, a spectrum-aware VLFM that operates with RGB inputs only at inference while retaining spectral cues learned during training. Our framework comprises two stages. First, Spectral Representation Distillation transfers spectral priors from a frozen multi-spectral teacher to an RGB student via a lightweight projector. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ikhado/sattxt
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Advanced Neural Network Applications · Multimodal Machine Learning Applications