Spurious Correlations and Where to Find Them
Gautam Sreekumar, Vishnu Naresh Boddeti

TL;DR
This paper investigates the causes of spurious correlations in data-driven models, analyzing how different hypotheses influence model behavior using synthetic datasets, and aims to identify indicators to improve mitigation strategies.
Contribution
It systematically studies the hypotheses behind spurious correlations and their impact on ERM models using synthetic data, revealing patterns linked to model design choices.
Findings
Identifies key hypotheses associated with spurious correlations.
Shows how these hypotheses influence ERM baseline performance.
Reveals patterns connecting hypotheses and model design choices.
Abstract
Spurious correlations occur when a model learns unreliable features from the data and are a well-known drawback of data-driven learning. Although there are several algorithms proposed to mitigate it, we are yet to jointly derive the indicators of spurious correlations. As a result, the solutions built upon standalone hypotheses fail to beat simple ERM baselines. We collect some of the commonly studied hypotheses behind the occurrence of spurious correlations and investigate their influence on standard ERM baselines using synthetic datasets generated from causal graphs. Subsequently, we observe patterns connecting these hypotheses and model design choices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Advanced Graph Neural Networks · Data Mining Algorithms and Applications
Methodsfail
