Remote Sensing Image Change Detection with Transformers

Hao Chen; Zipeng Qi; Zhenwei Shi

arXiv:2103.00208·cs.CV·July 13, 2021

Remote Sensing Image Change Detection with Transformers

Hao Chen, Zipeng Qi, Zhenwei Shi

PDF

5 Repos

TL;DR

This paper introduces a bitemporal image transformer (BIT) for remote sensing change detection, which models spatial-temporal contexts efficiently using semantic tokens, outperforming convolutional methods in accuracy and computational cost.

Contribution

The paper proposes a novel transformer-based framework that uses semantic tokens to model spatial-temporal context in remote sensing change detection, improving efficiency and accuracy over existing methods.

Findings

01

Outperforms convolutional baselines with 3x lower computational costs

02

Surpasses several state-of-the-art attention-based methods in accuracy

03

Effective with a simple ResNet18 backbone without complex structures

Abstract

Modern change detection (CD) has achieved remarkable success by the powerful discriminative ability of deep convolutions. However, high-resolution remote sensing CD remains challenging due to the complexity of objects in the scene. Objects with the same semantic concept may show distinct spectral characteristics at different times and spatial locations. Most recent CD pipelines using pure convolutions are still struggling to relate long-range concepts in space-time. Non-local self-attention approaches show promising performance via modeling dense relations among pixels, yet are computationally inefficient. Here, we propose a bitemporal image transformer (BIT) to efficiently and effectively model contexts within the spatial-temporal domain. Our intuition is that the high-level concepts of the change of interest can be represented by a few visual words, i.e., semantic tokens. To achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution · 1x1 Convolution · Feature Pyramid Network