The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation

Muhammad Ali; Kevin Alexander Laube; Madan Ravi Ganesh; Lukas Schott; Niclas Popp; Thomas Brox

arXiv:2604.25530·cs.CV·April 30, 2026

The Surprising Effectiveness of Canonical Knowledge Distillation for Semantic Segmentation

Muhammad Ali, Kevin Alexander Laube, Madan Ravi Ganesh, Lukas Schott, Niclas Popp, Thomas Brox

PDF

TL;DR

This paper demonstrates that simple canonical knowledge distillation methods outperform complex, segmentation-specific approaches when training time is properly accounted for, achieving state-of-the-art results with less complexity.

Contribution

It reveals that canonical KD methods are more effective than recent complex methods for semantic segmentation when training compute is matched, challenging the need for task-specific designs.

Findings

01

Canonical KD outperforms recent segmentation-specific methods under equal compute.

02

Feature-based distillation achieves state-of-the-art results on Cityscapes and ADE20K.

03

A ResNet-18 student reaches 99% of the teacher's performance with only a quarter of parameters.

Abstract

Recent knowledge distillation (KD) methods for semantic segmentation introduce increasingly complex hand-crafted objectives, yet are typically evaluated under fixed iteration schedules. These objectives substantially increase per-iteration cost, meaning equal iteration counts do not correspond to equal training budgets. It is therefore unclear whether reported gains reflect stronger distillation signals or simply greater compute. We show that iteration-based comparisons are misleading: when wall-clock compute is matched, canonical logit- and feature-based KD outperform recent segmentation-specific methods. Under extended training, feature-based distillation achieves state-of-the-art ResNet-18 performance on Cityscapes and ADE20K. A PSPNet ResNet-18 student closely approaches its ResNet-101 teacher despite using only one quarter of the parameters, reaching 99% of the teacher's mIoU on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.