A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample   Perspective

Yeonsung Jung; Jaeyun Song; June Yong Yang; Jin-Hwa Kim; Sung-Yub Kim,; Eunho Yang

arXiv:2411.00360·cs.LG·November 4, 2024

A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective

Yeonsung Jung, Jaeyun Song, June Yong Yang, Jin-Hwa Kim, Sung-Yub Kim,, Eunho Yang

PDF

Open Access 1 Video

TL;DR

This paper proposes a novel approach to mitigate dataset bias by detecting bias-conflicting samples through influence functions, inspired by mislabeled sample detection, leading to improved model fairness and complementing existing debiasing methods.

Contribution

It introduces a simple method using influence functions to identify bias-conflicting samples, enhancing bias detection and model rectification.

Findings

01

Improves detection precision of bias-conflicting samples.

02

Enhances model fairness when applied to biased datasets.

03

Complementary to existing debiasing techniques.

Abstract

Learning generalized models from biased data is an important undertaking toward fairness in deep learning. To address this issue, recent studies attempt to identify and leverage bias-conflicting samples free from spurious correlations without prior knowledge of bias or an unbiased set. However, spurious correlation remains an ongoing challenge, primarily due to the difficulty in precisely detecting these samples. In this paper, inspired by the similarities between mislabeled samples and bias-conflicting samples, we approach this challenge from a novel perspective of mislabeled sample detection. Specifically, we delve into Influence Function, one of the standard methods for mislabeled sample detection, for identifying bias-conflicting samples and propose a simple yet effective remedy for biased models by leveraging them. Through comprehensive analysis and experiments on diverse datasets,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective· slideslive

Taxonomy

TopicsMachine Learning and Data Classification