From Hope to Safety: Unlearning Biases of Deep Models via Gradient   Penalization in Latent Space

Maximilian Dreyer; Frederik Pahde; Christopher J. Anders; Wojciech; Samek; Sebastian Lapuschkin

arXiv:2308.09437·cs.LG·December 19, 2023

From Hope to Safety: Unlearning Biases of Deep Models via Gradient Penalization in Latent Space

Maximilian Dreyer, Frederik Pahde, Christopher J. Anders, Wojciech, Samek, Sebastian Lapuschkin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel concept-level bias correction method for deep neural networks using gradient penalization in latent space, effectively reducing biases across various datasets and architectures.

Contribution

The paper proposes a new approach for bias mitigation in deep models by penalizing gradients in the concept space, addressing limitations of previous input-level and latent space methods.

Findings

01

Effective bias reduction on multiple datasets

02

Works across different architectures like VGG, ResNet, EfficientNet

03

Code available for reproducibility

Abstract

Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method for model correction on the concept level that explicitly reduces model sensitivity towards biases via gradient penalization. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frederikpahde/rrclarc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning in Healthcare · COVID-19 diagnosis using AI

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Pointwise Convolution · Depthwise Convolution · Depthwise Separable Convolution · Sigmoid Activation · Squeeze-and-Excitation Block · 1x1 Convolution · Kaiming Initialization · Residual Connection · Inverted Residual Block