ChangeViT: Unleashing Plain Vision Transformers for Change Detection
Duowang Zhu, Xiaohu Huang, Haiyan Huang, Zhenfeng Shao, and Qimin, Cheng

TL;DR
ChangeViT leverages plain vision transformers with specialized modules to significantly improve change detection in remote sensing images, excelling at large-scale and fine-grained change identification across multiple datasets.
Contribution
The paper introduces ChangeViT, a novel framework that enhances plain ViTs for change detection by integrating detail-capture and feature injection modules, achieving state-of-the-art results.
Findings
Outperforms existing methods on multiple datasets
Excels at detecting large-scale changes
Captures fine-grained spatial details effectively
Abstract
Change detection in remote sensing images is essential for tracking environmental changes on the Earth's surface. Despite the success of vision transformers (ViTs) as backbones in numerous computer vision applications, they remain underutilized in change detection, where convolutional neural networks (CNNs) continue to dominate due to their powerful feature extraction capabilities. In this paper, our study uncovers ViTs' unique advantage in discerning large-scale changes, a capability where CNNs fall short. Capitalizing on this insight, we introduce ChangeViT, a framework that adopts a plain ViT backbone to enhance the performance of large-scale changes. This framework is supplemented by a detail-capture module that generates detailed spatial features and a feature injector that efficiently integrates fine-grained spatial information into high-level semantic learning. The feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
