HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning

Man Wang; Chenyang Liu; Wenjun Li; Feng Ni; Bing Jia; Baoqi Huang; Riting Xia; and Zhenwei Shi

arXiv:2605.15024·cs.CV·May 15, 2026

HiSem: Hierarchical Semantic Disentangling for Remote Sensing Image Change Captioning

Man Wang, Chenyang Liu, Wenjun Li, Feng Ni, Bing Jia, Baoqi Huang, Riting Xia, and Zhenwei Shi

PDF

1 Repo

TL;DR

This paper introduces HiSem, a hierarchical semantic disentangling network for remote sensing image change captioning, explicitly modeling different semantic granularities to improve understanding of scene changes.

Contribution

The paper proposes a novel hierarchical disentangling approach with modules for cross-temporal attention and adaptive semantic routing, addressing semantic entanglement issues in RSICC.

Findings

01

Achieved +7.52% BLEU-4 improvement on WHU-CDC dataset.

02

Explicit semantic disentangling enhances change understanding.

03

Outperforms previous methods on benchmark datasets.

Abstract

Remote sensing image change captioning (RSICC) aims to achieve high-level semantic understanding of genuine changes occurring between bi-temporal images. Despite notable progress, existing methods are fundamentally limited by a shared modeling assumption: changed and unchanged image pairs, which have intrinsically different semantic granularities, are processed under a unified modeling strategy. This modeling inconsistency leads to semantic entanglement between coarse-grained change existence judgment and fine-grained semantic understanding.To address the above limitation, we propose a novel hierarchical semantic disentangling network (HiSem) that explicitly disentangles semantic representations of different granularities. Specifically, we first introduce the Bidirectional Differential Attention Modulation (BDAM) module that leverages discrepancy-aware attention to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Man-Wang-star/HiSem
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.