Out of Spuriousity: Improving Robustness to Spurious Correlations without Group Annotations
Phuong Quynh Le, J\"org Schl\"otterer, Christin Seifert

TL;DR
This paper introduces a method to improve model robustness against spurious correlations by extracting a subnetwork that relies solely on invariant features, without needing group annotations.
Contribution
It proposes a novel approach using supervised contrastive loss to unlearn spurious correlations and extract invariant subnetworks from fully trained models.
Findings
Increases worst-group performance significantly.
Works with multiple spurious attributes without prior attribute labels.
Supports the hypothesis of invariant feature subnetworks in dense networks.
Abstract
Machine learning models are known to learn spurious correlations, i.e., features having strong relations with class labels but no causal relation. Relying on those correlations leads to poor performance in the data groups without these correlations and poor generalization ability. To improve the robustness of machine learning models to spurious correlations, we propose an approach to extract a subnetwork from a fully trained network that does not rely on spurious correlations. The subnetwork is found by the assumption that data points with the same spurious attribute will be close to each other in the representation space when training with ERM, then we employ supervised contrastive loss in a novel way to force models to unlearn the spurious connections. The increase in the worst-group performance of our approach contributes to strengthening the hypothesis that there exists a subnetwork…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
