TL;DR
This paper introduces HiSem, a hierarchical semantic disentangling network for remote sensing image change captioning, explicitly modeling different semantic granularities to improve understanding of scene changes.
Contribution
The paper proposes a novel hierarchical disentangling approach with modules for cross-temporal attention and adaptive semantic routing, addressing semantic entanglement issues in RSICC.
Findings
Achieved +7.52% BLEU-4 improvement on WHU-CDC dataset.
Explicit semantic disentangling enhances change understanding.
Outperforms previous methods on benchmark datasets.
Abstract
Remote sensing image change captioning (RSICC) aims to achieve high-level semantic understanding of genuine changes occurring between bi-temporal images. Despite notable progress, existing methods are fundamentally limited by a shared modeling assumption: changed and unchanged image pairs, which have intrinsically different semantic granularities, are processed under a unified modeling strategy. This modeling inconsistency leads to semantic entanglement between coarse-grained change existence judgment and fine-grained semantic understanding.To address the above limitation, we propose a novel hierarchical semantic disentangling network (HiSem) that explicitly disentangles semantic representations of different granularities. Specifically, we first introduce the Bidirectional Differential Attention Modulation (BDAM) module that leverages discrepancy-aware attention to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
