Adapting Vision Transformer for Efficient Change Detection
Yang Zhao, Yuxiang Zhang, Yanni Dong, Bo Du

TL;DR
This paper introduces an efficient fine-tuning method for vision transformers in change detection tasks, significantly reducing training time and resource usage while maintaining high performance across multiple benchmarks.
Contribution
It proposes a decoupled tuning framework that freezes pretrained encoder parameters and adds trainable components, enabling resource-efficient change detection.
Findings
Achieves competitive results with only 30 minutes training on LEVIR-CD.
Uses 9 GB memory, making it accessible for most researchers.
Framework extends to various change detection tasks.
Abstract
Most change detection models based on vision transformers currently follow a "pretraining then fine-tuning" strategy. This involves initializing the model weights using large scale classification datasets, which can be either natural images or remote sensing images. However, fully tuning such a model requires significant time and resources. In this paper, we propose an efficient tuning approach that involves freezing the parameters of the pretrained image encoder and introducing additional training parameters. Through this approach, we have achieved competitive or even better results while maintaining extremely low resource consumption across six change detection benchmarks. For example, training time on LEVIR-CD, a change detection benchmark, is only half an hour with 9 GB memory usage, which could be very convenient for most researchers. Additionally, the decoupled tuning framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Chemical Sensor Technologies
