A Too-Good-to-be-True Prior to Reduce Shortcut Reliance
Nikolay Dagaev, Brett D. Roads, Xiaoliang Luo, Daniel N. Barry,, Kaustubh R. Patil, Bradley C. Love

TL;DR
This paper proposes a two-stage training method using low-capacity networks to detect and reduce reliance on superficial shortcuts, thereby improving out-of-distribution generalization in deep networks.
Contribution
It introduces a novel two-stage approach where low-capacity networks identify shortcuts to guide high-capacity networks towards invariant features, enhancing generalization.
Findings
Two-stage LCN-HCN approach reduces shortcut reliance
Method improves out-of-distribution generalization on modified CIFAR-10
LCNs effectively serve as shortcut detectors
Abstract
Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on "shortcuts" - superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
