AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network

Yu Hu; Jianyang Gu; Hao Liu; Yue Cao; Jozsef Hamari; Zheng Liu; Mohsen Zardadi

arXiv:2603.12659·cs.CV·March 16, 2026

AVION: Aerial Vision-Language Instruction from Offline Teacher to Prompt-Tuned Network

Yu Hu, Jianyang Gu, Hao Liu, Yue Cao, Jozsef Hamari, Zheng Liu, Mohsen Zardadi

PDF

Open Access

TL;DR

AVION is a knowledge distillation framework that adapts vision-language models for remote sensing imagery, improving classification and retrieval tasks by leveraging semantic-rich textual prototypes and prompt-tuning.

Contribution

It introduces a novel framework combining semantic-rich textual prototypes with prompt-tuning for effective remote sensing adaptation of vision-language models.

Findings

01

Improves few-shot classification accuracy.

02

Enhances cross-modal retrieval mean recall.

03

Maintains generalization to novel categories.

Abstract

Adapting vision-language models to remote sensing imagery remains challenging due to two key factors: limited semantic coverage in textual representations and insufficient adaptability of visual features. These issues are particularly significant in aerial scenes, which involve various visual appearances and fine-grained object distinctions. We propose AVION, a knowledge distillation framework tailored for remote sensing adaptation of vision-language models. The teacher module constructs semantically rich textual prototypes by collecting descriptions from a large language model and verifying validity using remote sensing image features. The student module integrates lightweight and learnable prompts into both vision and language encoders, guided by the teacher to align embeddings and their cross-modal relationships. Once trained, the student operates independently during inference.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications