A Too-Good-to-be-True Prior to Reduce Shortcut Reliance

Nikolay Dagaev; Brett D. Roads; Xiaoliang Luo; Daniel N. Barry,; Kaustubh R. Patil; Bradley C. Love

arXiv:2102.06406·cs.CV·October 22, 2021

A Too-Good-to-be-True Prior to Reduce Shortcut Reliance

Nikolay Dagaev, Brett D. Roads, Xiaoliang Luo, Daniel N. Barry,, Kaustubh R. Patil, Bradley C. Love

PDF

TL;DR

This paper proposes a two-stage training method using low-capacity networks to detect and reduce reliance on superficial shortcuts, thereby improving out-of-distribution generalization in deep networks.

Contribution

It introduces a novel two-stage approach where low-capacity networks identify shortcuts to guide high-capacity networks towards invariant features, enhancing generalization.

Findings

01

Two-stage LCN-HCN approach reduces shortcut reliance

02

Method improves out-of-distribution generalization on modified CIFAR-10

03

LCNs effectively serve as shortcut detectors

Abstract

Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on "shortcuts" - superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.