Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks
Lukas Hauzenberger, Shahed Masoudian, Deepak Kumar, Markus Schedl,, Navid Rekabsaz

TL;DR
This paper introduces a modular bias mitigation method using sparse attribute-removal subnetworks that can be integrated on-demand at inference time, offering flexible and effective debiasing without retraining the entire model.
Contribution
The paper presents a novel modular approach with sparse subnetworks for bias mitigation, enabling selective and on-demand debiasing at inference time.
Findings
Maintains task performance while improving bias mitigation effectiveness.
Effective utilization of subnetworks for selective bias mitigation.
Comparable or better bias reduction compared to baseline finetuning.
Abstract
Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce additional optimization criteria, and update the model to reach a new debiased state. However, in practice, end-users and practitioners might prefer to switch back to the original model, or apply debiasing only on a specific subset of protected attributes. To enable this, we propose a novel modular bias mitigation approach, consisting of stand-alone highly sparse debiasing subnetworks, where each debiasing module can be integrated into the core model on-demand at inference time. Our approach draws from the concept of \emph{diff} pruning, and proposes a novel training regime adaptable to various representation disentanglement optimizations. We conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Interpreting and Communication in Healthcare
