Spurious Correlations and Where to Find Them

Gautam Sreekumar; Vishnu Naresh Boddeti

arXiv:2308.11043·cs.LG·August 23, 2023

Spurious Correlations and Where to Find Them

Gautam Sreekumar, Vishnu Naresh Boddeti

PDF

Open Access

TL;DR

This paper investigates the causes of spurious correlations in data-driven models, analyzing how different hypotheses influence model behavior using synthetic datasets, and aims to identify indicators to improve mitigation strategies.

Contribution

It systematically studies the hypotheses behind spurious correlations and their impact on ERM models using synthetic data, revealing patterns linked to model design choices.

Findings

01

Identifies key hypotheses associated with spurious correlations.

02

Shows how these hypotheses influence ERM baseline performance.

03

Reveals patterns connecting hypotheses and model design choices.

Abstract

Spurious correlations occur when a model learns unreliable features from the data and are a well-known drawback of data-driven learning. Although there are several algorithms proposed to mitigate it, we are yet to jointly derive the indicators of spurious correlations. As a result, the solutions built upon standalone hypotheses fail to beat simple ERM baselines. We collect some of the commonly studied hypotheses behind the occurrence of spurious correlations and investigate their influence on standard ERM baselines using synthetic datasets generated from causal graphs. Subsequently, we observe patterns connecting these hypotheses and model design choices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Graph Neural Networks · Data Mining Algorithms and Applications

Methodsfail