Continual Vision-Language Learning for Remote Sensing: Benchmarking and Analysis
Xingxing Weng, Ruifeng Ni, Chao Pang, XiangYu Hao, Yishan Wang, Xiaokang Zhang, Wei Xu, Gui-Song Xia

TL;DR
This paper introduces CLeaRS, a comprehensive benchmark for evaluating continual vision-language learning in remote sensing, highlighting challenges like catastrophic forgetting and limited effectiveness of existing methods.
Contribution
The work presents the first dedicated benchmark, CLeaRS, with evaluation protocols and extensive analysis of continual learning challenges in remote sensing vision-language models.
Findings
Catastrophic forgetting occurs across all evaluation settings.
Existing continual learning methods show limited effectiveness.
The benchmark enables systematic assessment of RS VLMs' adaptability.
Abstract
Current remote sensing vision-language models (RS VLMs) demonstrate impressive performance in image interpretation but rely on static training data, limiting their ability to accommodate continuously emerging sensing modalities and downstream tasks. This exposes a fundamental challenge: enabling RS VLMs to continually adapt without catastrophic forgetting. Despite its practical importance, the continual learning capability of RS VLMs remains underexplored, and no dedicated benchmark currently exists. In this work, we present CLeaRS, a comprehensive benchmark for continual vision-language learning in remote sensing. CLeaRS comprises 10 curated subsets with over 207k image-text pairs, spanning diverse interpretation tasks, sensing modalities, and application scenarios. We further define three evaluation protocols: long-horizon, modality-incremental, and task-incremental settings, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
