STAND: Semantic Anchoring Constraint with Dual-Granularity Disambiguation for Remote Sensing Image Change Captioning

Yanpei Gong; Beichen Zhang; Hao Wang; Zhaobo Qi; Xinyan Liu; Yuanrong Xu; Ruiyang Gao; Weigang Zhang

arXiv:2604.23309·cs.CV·April 28, 2026

STAND: Semantic Anchoring Constraint with Dual-Granularity Disambiguation for Remote Sensing Image Change Captioning

Yanpei Gong, Beichen Zhang, Hao Wang, Zhaobo Qi, Xinyan Liu, Yuanrong Xu, Ruiyang Gao, Weigang Zhang

PDF

TL;DR

STAND introduces a novel framework for remote sensing image change captioning that progressively resolves viewpoint, scale, and knowledge ambiguities using semantic anchoring and dual-granularity disambiguation.

Contribution

It proposes a comprehensive method combining interpretability, dual-granularity disambiguation, and semantic anchoring to improve change captioning accuracy in remote sensing images.

Findings

01

STAND outperforms existing methods in accuracy on benchmark datasets.

02

The dual-granularity disambiguation effectively resolves spatial uncertainties.

03

Semantic concept anchoring improves the translation of features into precise captions.

Abstract

Remote sensing image change captioning (RSICC) aims to describe the difference between two remote sensing images. While recent methods have explored video modeling, they largely overlook the inherent ambiguities in viewpoint, scale, and prior knowledge, lacking effective constraints on the encoder. In this paper, we present STAND, a Semantic Anchoring Constraint with Dual-Granularity Disambiguation for RSICC, to progressively resolve these ambiguities. Specifically, to establish a reliable feature foundation, we first introduce an interpretable constraint to regularize temporal representations. Operating on these purified features, a dual-granularity disambiguation module resolves spatial uncertainties by coupling macro-level global context aggregation for viewpoint confusion with micro-level frequency-refocused attention for small-object scale enhancement. Ultimately, to translate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.