Beyond GSD-as-Token: Continuous Scale Conditioning for Remote Sensing VLMs
Song Zhang, Yanlong Chen, Yilin Li, Yining Chen, Zili Yi, Xiaowei Zhang, and Yawei Li

TL;DR
This paper introduces ScaleEarth, a novel framework for remote sensing vision-language models that dynamically conditions on ground sampling distance (GSD) as a continuous variable, improving performance across diverse Earth-system tasks.
Contribution
The paper proposes CS-HLoRA, a continuous scale-conditioning method, and a new scale-aware dataset, GeoScale-VQA, advancing remote sensing VLMs beyond discrete GSD tokens.
Findings
ScaleEarth achieves state-of-the-art results on remote sensing benchmarks.
CS-HLoRA effectively modulates model computation based on GSD.
The approach enables dynamic routing and GSD prediction without sensor metadata.
Abstract
Remote sensing vision-language models (RS-VLMs) face a fundamental mismatch with natural-image counterparts: the same geographic object exhibits radically different visual evidence across ground sampling distances (GSDs) spanning multiple orders of magnitude. Yet existing RS-VLMs often discard GSD or inject it as a discrete text token, forcing a single static parameter set to absorb the entire scale spectrum. We introduce ScaleEarth, a parameter-efficient fine-tuning framework built on Qwen3-VL that treats GSD as a continuous conditioning variable governing the model's computation path. At its core, CS-HLoRA (Continuous Scale-Conditioned Hyper-LoRA) modulates the LoRA low-rank subspace through a GSD-driven gate, enabling the model to dynamically route computation by physical scale. To remove reliance on sensor metadata at deployment, we pair CS-HLoRA with SSE-U, a lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
