Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness
Steven Landgraf, Markus Hillemann, Markus Ulrich

TL;DR
This paper highlights the importance of evaluating semi-supervised segmentation models not only on accuracy but also on reliability and robustness, proposing a new comprehensive metric called RSS for better real-world applicability.
Contribution
It introduces the Reliable Segmentation Score (RSS), a novel metric that combines accuracy, calibration, and uncertainty to holistically evaluate segmentation models.
Findings
Semi-supervised models often sacrifice reliability for accuracy.
UniMatchV2 shows robustness but still has reliability issues.
Holistic evaluation metrics like RSS are essential for real-world deployment.
Abstract
Semantic segmentation is critical for scene understanding but demands costly pixel-wise annotations, attracting increasing attention to semi-supervised approaches to leverage abundant unlabeled data. While semi-supervised segmentation is often promoted as a path toward scalable, real-world deployment, it is astonishing that current evaluation protocols exclusively focus on segmentation accuracy, entirely overlooking reliability and robustness. These qualities, which ensure consistent performance under diverse conditions (robustness) and well-calibrated model confidences as well as meaningful uncertainties (reliability), are essential for safety-critical applications like autonomous driving, where models must handle unpredictable environments and avoid sudden failures at all costs. To address this gap, we introduce the Reliable Segmentation Score (RSS), a novel metric that combines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning
