Towards Comparable Knowledge Distillation in Semantic Image Segmentation
Onno Niemann, Christopher Vox, and Thorben Werner

TL;DR
This paper analyzes the challenges in comparing knowledge distillation methods for semantic image segmentation due to inconsistent training setups and hyperparameter tuning, and proposes a standardized baseline for fair evaluation.
Contribution
It identifies issues in current comparisons, highlights the importance of hyperparameter tuning, and establishes a robust baseline for future research in the field.
Findings
Hyperparameter tuning significantly affects distillation performance.
Most existing techniques do not outperform the proposed baseline.
Standardized evaluation improves comparability across studies.
Abstract
Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
