Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

Xingxing Weng; Ruifeng Ni; Chao Pang; XiangYu Hao; Yishan Wang; Xiaokang Zhang; Wei Xu; Gui-Song Xia

arXiv:2604.00820·cs.CV·April 2, 2026

Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis

Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia

PDF

TL;DR

This paper introduces CLeaRS, a comprehensive benchmark for evaluating continual vision-language learning in remote sensing, highlighting challenges like catastrophic forgetting and limited effectiveness of existing methods.

Contribution

The work presents the first dedicated benchmark, CLeaRS, with evaluation protocols and extensive analysis of continual learning challenges in remote sensing vision-language models.

Findings

01

Catastrophic forgetting occurs across all evaluation settings.

02

Existing continual learning methods show limited effectiveness.

03

The benchmark enables systematic assessment of RS VLMs' adaptability.

Abstract

Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually adapt without catastrophic forgetting. Despite its practical importance, the continual learning capability of RS VLMs remains underexplored, and no dedicated benchmark currently exists. In this work, we present CLeaRS, a comprehensive benchmark for continual vision-language learning in remote sensing. CLeaRS comprises 10 curated subsets with over 207k image-text pairs, spanning diverse interpretation tasks, sensing modalities, and application scenarios. We further define three evaluation protocols: long-horizon, modality-incremental, and task-incremental settings, to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.