Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples

Weiwei Li; Junzhuo Liu; Yuanyuan Ren; Yuchen Zheng; Yahao Liu; Wen Li

arXiv:2512.22874·cs.CV·December 30, 2025

Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples

Weiwei Li, Junzhuo Liu, Yuanyuan Ren, Yuchen Zheng, Yahao Liu, Wen Li

PDF

Open Access

TL;DR

This paper introduces a data-driven method to reduce spurious correlations in deep learning by exploiting sample clustering, leading to more unbiased models with significantly improved worst group accuracy.

Contribution

It proposes a novel pipeline that identifies, neutralizes, and eliminates spurious features through sample clustering and feature transformation, enhancing model robustness.

Findings

01

Over 20% improvement in worst group accuracy on benchmarks

02

Effective in both image and NLP debiasing tasks

03

Outperforms standard ERM methods significantly

Abstract

Deep learning models are known to often learn features that spuriously correlate with the class label during training but are irrelevant to the prediction task. Existing methods typically address this issue by annotating potential spurious attributes, or filtering spurious features based on some empirical assumptions (e.g., simplicity of bias). However, these methods may yield unsatisfactory performance due to the intricate and elusive nature of spurious correlations in real-world data. In this paper, we propose a data-oriented approach to mitigate the spurious correlation in deep learning models. We observe that samples that are influenced by spurious features tend to exhibit a dispersed distribution in the learned feature space. This allows us to identify the presence of spurious features. Subsequently, we obtain a bias-invariant representation by neutralizing the spurious features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI