STeInFormer: Spatial-Temporal Interaction Transformer Architecture for   Remote Sensing Change Detection

Xiaowen Ma; Zhenkai Wu; Mengting Ma; Mengjiao Zhao; Fan Yang; Zhenhong; Du; Wei Zhang

arXiv:2412.17247·cs.CV·December 24, 2024

STeInFormer: Spatial-Temporal Interaction Transformer Architecture for Remote Sensing Change Detection

Xiaowen Ma, Zhenkai Wu, Mengting Ma, Mengjiao Zhao, Fan Yang, Zhenhong, Du, Wei Zhang

PDF

Open Access 1 Repo

TL;DR

STeInFormer introduces a novel Transformer-based backbone for remote sensing change detection, effectively capturing spatial-temporal interactions and spectral features, leading to superior performance over existing methods.

Contribution

The paper presents the first general backbone network specifically designed for RSCD, incorporating a spatial-temporal interaction Transformer and a multi-frequency token mixer.

Findings

01

Outperforms state-of-the-art methods on three datasets

02

Achieves a favorable efficiency-accuracy trade-off

03

Validates effectiveness through extensive experiments

Abstract

Convolutional neural networks and attention mechanisms have greatly benefited remote sensing change detection (RSCD) because of their outstanding discriminative ability. Existent RSCD methods often follow a paradigm of using a non-interactive Siamese neural network for multi-temporal feature extraction and change detection heads for feature fusion and change representation. However, this paradigm lacks the contemplation of the characteristics of RSCD in temporal and spatial dimensions, and causes the drawback on spatial-temporal interaction that hinders high-quality feature extraction. To address this problem, we present STeInFormer, a spatial-temporal interaction Transformer architecture for multi-temporal feature extraction, which is the first general backbone network specifically designed for RSCD. In addition, we propose a parameter-free multi-frequency token mixer to integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xwmaxwma/rschange
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Adam