MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption

Ruixun Liu; Kaiyu Li; Jiayi Song; Dongwei Sun; Xiangyong Cao

arXiv:2410.23946·cs.CV·November 1, 2024

MV-CC: Mask Enhanced Video Model for Remote Sensing Change Caption

Ruixun Liu, Kaiyu Li, Jiayi Song, Dongwei Sun, Xiangyong Cao

PDF

Open Access 1 Repo

TL;DR

This paper introduces MV-CC, a novel video model-based approach for remote sensing change captioning that simplifies architecture by removing the need for complex fusion modules and employs masks to focus on change regions, resulting in improved performance.

Contribution

The paper proposes a mask-enhanced video model for change captioning that eliminates the manual fusion module design, leveraging off-the-shelf video encoders and change masks for better focus and accuracy.

Findings

01

Outperforms state-of-the-art RSICC methods

02

Uses off-the-shelf video encoder for spatial and temporal features

03

Employs change masks to improve focus on regions of interest

Abstract

Remote sensing image change caption (RSICC) aims to provide natural language descriptions for bi-temporal remote sensing images. Since Change Caption (CC) task requires both spatial and temporal features, previous works follow an encoder-fusion-decoder architecture. They use an image encoder to extract spatial features and the fusion module to integrate spatial features and extract temporal features, which leads to increasingly complex manual design of the fusion module. In this paper, we introduce a novel video model-based paradigm without design of the fusion module and propose a Mask-enhanced Video model for Change Caption (MV-CC). Specifically, we use the off-the-shelf video encoder to simultaneously extract the temporal and spatial features of bi-temporal images. Furthermore, the types of changes in the CC are set based on specific task requirements, and to enable the model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuruixun/mv-cc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Satellite Image Processing and Photogrammetry

MethodsSparse Evolutionary Training · Focus