TL;DR
This paper investigates how semantic segmentation models can incorrectly assign plausible but wrong labels under correlation shifts, introducing diagnostic tools and a flip-risk score to assess robustness.
Contribution
It identifies and quantifies semantic label-flip failures in segmentation under distribution shifts, proposing new metrics and a flip-risk score for robustness evaluation.
Findings
Increasing correlation during training widens label-flip errors in test conditions.
The flip-risk score can effectively flag flip-prone cases at inference.
Decomposing errors reveals insights beyond traditional overlap metrics.
Abstract
The robustness of machine learning models can be compromised by spurious correlations between non-causal features in the input data and target labels. A common way to test for such correlations is to train on data where the label is strongly tied to some non-causal cue, then evaluate on examples where that tie no longer holds. This idea is well established for classification tasks, but for semantic segmentation the specific failure modes are not well understood. We show that a model may achieve reasonable overlap while assigning the wrong semantic label, swapping one plausible foreground class for another, even when object boundaries are largely correct. We focus on this semantic label-flip behaviour and quantify it with a simple diagnostic (Flip) that counts how often ground truth foreground pixels are assigned the wrong foreground identity while remaining predicted as foreground. In a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
