Revisiting Continual Semantic Segmentation with Pre-trained Vision Models

Duzhen Zhang; Yong Ren; Wei Cong; Junhao Zheng; Qiaoyi Su; Shuncheng Jia; Zhong-Zhi Li; Xuanle Zhao; Ye Bai; Feilong Chen; Qi Tian; Tielin Zhang

arXiv:2508.04267·cs.CV·August 7, 2025

Revisiting Continual Semantic Segmentation with Pre-trained Vision Models

Duzhen Zhang, Yong Ren, Wei Cong, Junhao Zheng, Qiaoyi Su, Shuncheng Jia, Zhong-Zhi Li, Xuanle Zhao, Ye Bai, Feilong Chen, Qi Tian, Tielin Zhang

PDF

TL;DR

This paper challenges the assumption that pre-trained vision models suffer severe forgetting in continual semantic segmentation, showing they retain knowledge well and proposing a simple enhancement to improve performance efficiently.

Contribution

The study systematically revisits forgetting in DFT for CSS, revealing PVMs' anti-forgetting capabilities and introducing DFT* with simple strategies that outperform complex methods.

Findings

01

PVMs retain knowledge with minimal forgetting during DFT.

02

Forgetting mainly results from classifier drift, not backbone degradation.

03

DFT* achieves superior performance with fewer parameters and less training time.

Abstract

Continual Semantic Segmentation (CSS) seeks to incrementally learn to segment novel classes while preserving knowledge of previously encountered ones. Recent advancements in CSS have been largely driven by the adoption of Pre-trained Vision Models (PVMs) as backbones. Among existing strategies, Direct Fine-Tuning (DFT), which sequentially fine-tunes the model across classes, remains the most straightforward approach. Prior work often regards DFT as a performance lower bound due to its presumed vulnerability to severe catastrophic forgetting, leading to the development of numerous complex mitigation techniques. However, we contend that this prevailing assumption is flawed. In this paper, we systematically revisit forgetting in DFT across two standard benchmarks, Pascal VOC 2012 and ADE20K, under eight CSS settings using two representative PVM backbones: ResNet101 and Swin-B. Through a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.