# Optimized Reinforcement Learning-Driven Model for Remote Sensing Change Detection

**Authors:** Yan Zhao, Zhiyun Xiao, Tengfei Bao, Yulong Zhou

PMC · DOI: 10.3390/jimaging12030139 · 2026-03-19

## TL;DR

This paper introduces a new remote sensing change detection framework using reinforcement learning to improve accuracy by iteratively refining predictions.

## Contribution

A novel feedback-driven CD framework combining U-Net and reinforcement learning for adaptive error correction in change detection.

## Key findings

- The proposed RL refinement increases mIoU by 3.07 to 6.13 points across four datasets.
- The framework improves boundary fidelity and suppresses pseudo-changes caused by shadows and illumination variations.
- The RL module is adaptable and shows consistent gains when integrated into different CD backbones.

## Abstract

In recent years, deep learning has driven remarkable progress in remote sensing change detection (CD); however, practical deployment is still hindered by two limitations. First, CD results are easily degraded by imaging-induced uncertainties—mixed pixels and blurred boundaries, radiometric inconsistencies (e.g., shadows and seasonal illumination changes), and slight residual misregistration—leading to pseudo-changes and fragmented boundaries. Second, prevailing methods follow a static one-pass inference paradigm and lack an explicit feedback mechanism for adaptive error correction, which weakens generalization in complex or unseen scenes. To address these issues, we propose a feedback-driven CD framework that integrates a dual-branch U-Net with deep reinforcement learning (RL) for pixel-level probabilistic iterative refinement of an initial change probability map. The backbone produces a preliminary posterior estimate of change likelihood from multi-scale bi-temporal features, while a PPO-based RL agent formulates refinement as a Markov decision process. The agent leverages a state representation that fuses multi-scale features, prediction confidence/uncertainty, and spatial consistency cues (e.g., neighborhood coherence and edge responses) to apply multi-step corrective actions. From an imaging and interpretation perspective, the RL module can be viewed as a learnable, self-adaptive imaging optimization mechanism: for high-risk regions affected by blurred boundaries, radiometric inconsistencies, and local misalignment, the agent performs feedback-driven multi-step corrections to improve boundary fidelity and spatial coherence while suppressing pseudo-changes caused by shadows and illumination variations. Experiments on four datasets (CDD, SYSU-CD, PVCD, and BRIGHT) verify consistent improvements. Using SiamU-Net as an example, the proposed RL refinement increases mIoU by 3.07, 2.54, 6.13, and 3.1 points on CDD, SYSU-CD, PVCD, and BRIGHT, respectively, with similarly consistent gains observed when the same RL module is integrated into other representative CD backbones.

## Full-text entities

- **Genes:** NPPA (natriuretic peptide A) [NCBI Gene 4878] {aka ANF, ANP, ATFB6, ATRST2, CDD, CDD-ANF}, FEN1 (flap structure-specific endonuclease 1) [NCBI Gene 2237] {aka FEN-1, MF1, RAD2}
- **Diseases:** PVCD (MESH:D009402), PV (MESH:D011087), injury to (MESH:D014947), PPO (MESH:D014897)
- **Chemicals:** PV (MESH:D010404), PPO (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** SYSU — Homo sapiens (Human), Embryonic stem cell (CVCL_C067)

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13027485/full.md

---
Source: https://tomesphere.com/paper/PMC13027485