UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model

Yupeng Gao; Tianyu Li; Guoqing Wang; Yang Yang

arXiv:2605.04409·cs.CV·May 7, 2026

UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model

Yupeng Gao, Tianyu Li, Guoqing Wang, Yang Yang

PDF

1 Repo

TL;DR

This paper introduces PTNet, a novel framework for change captioning and detection in urban scenes, along with a large UAV-based dataset UCCD for urban construction monitoring.

Contribution

It proposes a structured change semantics modeling approach and a new benchmark dataset, advancing semantic understanding in high-resolution urban change detection.

Findings

01

PTNet outperforms existing methods on UCCD and WHU-CDC datasets.

02

UCCD dataset contains 9,000 image pairs and 45,000 sentences for urban monitoring.

03

The source code and dataset are publicly available.

Abstract

Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

G124556/ptnet
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.