TERDNet: Transformer Encoder-Recurrent Decoder Network for Scene Change Detection

Jiae Yoon; Ue-Hwan Kim

arXiv:2605.20822·cs.CV·May 21, 2026

TERDNet: Transformer Encoder-Recurrent Decoder Network for Scene Change Detection

Jiae Yoon, Ue-Hwan Kim

PDF

1 Repo

TL;DR

TERDNet is a novel transformer-based network with recurrent decoding that significantly improves scene change detection accuracy and detail, validated on multiple benchmarks and robust to viewpoint variations.

Contribution

It introduces a transformer encoder with a recurrent decoder and fusion strategies, enhancing feature refinement and change mask precision in scene change detection.

Findings

01

Outperforms prior methods on four benchmarks.

02

Effective in handling viewpoint misalignments.

03

Pretraining strategies improve segmentation quality.

Abstract

In this work, we address the challenge of Scene Change Detection (SCD), where the goal is to identify variations between two images of the same location captured at different times. Existing SCD models often overlook the varying importance of features across layers, employ single-step decoders that confine refinement, and provide limited insight into encoder pretraining strategies. We propose TERDNet, a Transformer Encoder-Recurrent Decoder Network designed to overcome these limitations. TERDNet consists of a transformer-based encoder that extracts multi-level representations, a feature fusion module that integrates correlation volumes with these features, a recurrent 3-gate-GRU decoder that performs iterative refinement, and a combined convolution-interpolation upsampler that restores fine-grained resolution. Extensive experiments on four public benchmarks show that TERDNet…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AutoCompSysLab/TERDNet
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.