Model Debiasing by Learnable Data Augmentation

Pietro Morerio; Ruggero Ragonesi; Vittorio Murino

arXiv:2408.04955·cs.LG·August 12, 2024

Model Debiasing by Learnable Data Augmentation

Pietro Morerio, Ruggero Ragonesi, Vittorio Murino

PDF

Open Access

TL;DR

This paper introduces a novel two-stage data augmentation method to improve neural network generalization on biased datasets, effectively reducing reliance on spurious correlations without requiring bias annotations.

Contribution

The work presents a bias-agnostic data augmentation pipeline that identifies biased samples and enhances model robustness, outperforming existing methods on synthetic and real datasets.

Findings

01

Achieves state-of-the-art accuracy on biased datasets

02

Improves generalization regardless of bias level

03

Robust performance on both synthetic and real-world data

Abstract

Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning "shortcuts". In essence, such models are often prone to learn spurious correlations between data and labels. In this work, we tackle the problem of learning from biased data in the very realistic unsupervised scenario, i.e., when the bias is unknown. This is a much harder task as compared to the supervised case, where auxiliary, bias-related annotations, can be exploited in the learning process. This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training. First, biased/unbiased samples are identified by training over-biased models. Second, such subdivision (typically noisy) is exploited within a data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Processing Techniques