Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks

Lukas Hauzenberger; Shahed Masoudian; Deepak Kumar; Markus Schedl,; Navid Rekabsaz

arXiv:2205.15171·cs.LG·June 6, 2023·1 cites

Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks

Lukas Hauzenberger, Shahed Masoudian, Deepak Kumar, Markus Schedl,, Navid Rekabsaz

PDF

Open Access 1 Repo

TL;DR

This paper introduces a modular bias mitigation method using sparse attribute-removal subnetworks that can be integrated on-demand at inference time, offering flexible and effective debiasing without retraining the entire model.

Contribution

The paper presents a novel modular approach with sparse subnetworks for bias mitigation, enabling selective and on-demand debiasing at inference time.

Findings

01

Maintains task performance while improving bias mitigation effectiveness.

02

Effective utilization of subnetworks for selective bias mitigation.

03

Comparable or better bias reduction compared to baseline finetuning.

Abstract

Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce additional optimization criteria, and update the model to reach a new debiased state. However, in practice, end-users and practitioners might prefer to switch back to the original model, or apply debiasing only on a specific subset of protected attributes. To enable this, we propose a novel modular bias mitigation approach, consisting of stand-alone highly sparse debiasing subnetworks, where each debiasing module can be integrated into the core model on-demand at inference time. Our approach draws from the concept of \emph{diff} pruning, and proposes a novel training regime adaptable to various representation disentanglement optimizations. We conduct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sirluk/sparse_transformers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Interpreting and Communication in Healthcare