Unlabeled Debiasing in Downstream Tasks via Class-wise Low Variance Regularization
Shahed Masoudian, Markus Frohmann, Navid Rekabsaz, Markus Schedl

TL;DR
This paper proposes a novel, label-free debiasing regularization method based on class-wise embedding variance, effectively reducing societal biases in language models during downstream tasks without requiring protected attribute labels.
Contribution
The authors introduce a class-wise variance regularization technique that mitigates biases without needing attribute labels, applicable to any attribute and effective across multiple datasets.
Findings
Outperforms existing debiasing methods relying on attribute labels.
Maintains task performance while reducing bias.
Effective across encoder models and diverse datasets.
Abstract
Language models frequently inherit societal biases from their training data. Numerous techniques have been proposed to mitigate these biases during both the pre-training and fine-tuning stages. However, fine-tuning a pre-trained debiased language model on a downstream task can reintroduce biases into the model. Additionally, existing debiasing methods for downstream tasks either (i) require labels of protected attributes (e.g., age, race, or political views) that are often not available or (ii) rely on indicators of bias, which restricts their applicability to gender debiasing since they rely on gender-specific words. To address this, we introduce a novel debiasing regularization technique based on the class-wise variance of embeddings. Crucially, our method does not require attribute labels and targets any attribute, thus addressing the shortcomings of existing debiasing methods. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Advanced Neural Network Applications
MethodsLow Variance Regularization
