On Measuring Localization of Shortcuts in Deep Networks
Nikita Tsoy, Nikola Konstantinov

TL;DR
This paper investigates how shortcuts in deep networks are distributed across layers, revealing that they are spread throughout the model and vary by architecture, which complicates the development of universal mitigation strategies.
Contribution
It introduces a novel layer-wise analysis method for shortcut localization, providing insights into their distribution and informing more effective, architecture-specific mitigation approaches.
Findings
Shortcuts are distributed across all layers, not localized.
Shallow layers mainly encode spurious features.
Deeper layers tend to forget core features.
Abstract
Shortcuts, spurious rules that perform well during training but fail to generalize, present a major challenge to the reliability of deep networks (Geirhos et al., 2020). However, the impact of shortcuts on feature representations remains understudied, obstructing the design of principled shortcut-mitigation methods. To overcome this limitation, we investigate the layer-wise localization of shortcuts in deep models. Our novel experiment design quantifies the layer-wise contribution to accuracy degradation caused by a shortcut-inducing skew by counterfactual training on clean and skewed datasets. We employ our design to study shortcuts on CIFAR-10, Waterbirds, and CelebA datasets across VGG, ResNet, DeiT, and ConvNeXt architectures. We find that shortcut learning is not localized in specific layers but distributed throughout the network. Different network parts play different roles in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
