STAND: Semantic Anchoring Constraint with Dual-Granularity Disambiguation for Remote Sensing Image Change Captioning
Yanpei Gong, Beichen Zhang, Hao Wang, Zhaobo Qi, Xinyan Liu, Yuanrong Xu, Ruiyang Gao, Weigang Zhang

TL;DR
STAND introduces a novel framework for remote sensing image change captioning that progressively resolves viewpoint, scale, and knowledge ambiguities using semantic anchoring and dual-granularity disambiguation.
Contribution
It proposes a comprehensive method combining interpretability, dual-granularity disambiguation, and semantic anchoring to improve change captioning accuracy in remote sensing images.
Findings
STAND outperforms existing methods in accuracy on benchmark datasets.
The dual-granularity disambiguation effectively resolves spatial uncertainties.
Semantic concept anchoring improves the translation of features into precise captions.
Abstract
Remote sensing image change captioning (RSICC) aims to describe the difference between two remote sensing images. While recent methods have explored video modeling, they largely overlook the inherent ambiguities in viewpoint, scale, and prior knowledge, lacking effective constraints on the encoder. In this paper, we present STAND, a Semantic Anchoring Constraint with Dual-Granularity Disambiguation for RSICC, to progressively resolve these ambiguities. Specifically, to establish a reliable feature foundation, we first introduce an interpretable constraint to regularize temporal representations. Operating on these purified features, a dual-granularity disambiguation module resolves spatial uncertainties by coupling macro-level global context aggregation for viewpoint confusion with micro-level frequency-refocused attention for small-object scale enhancement. Ultimately, to translate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
