FairDistillation: Mitigating Stereotyping in Language Models
Pieter Delobelle, Bettina Berendt

TL;DR
FairDistillation is a cross-lingual knowledge distillation method that creates smaller, more fair language models by reducing stereotypes without sacrificing task performance, at lower computational costs.
Contribution
We introduce FairDistillation, a novel cross-lingual knowledge distillation approach that mitigates stereotypes in language models efficiently across multiple languages.
Findings
Reduces stereotyping and biases in language models
Maintains downstream task performance
Lower computational cost compared to existing methods
Abstract
Large pre-trained language models are successfully being used in a variety of tasks, across many languages. With this ever-increasing usage, the risk of harmful side effects also rises, for example by reproducing and reinforcing stereotypes. However, detecting and mitigating these harms is difficult to do in general and becomes computationally expensive when tackling multiple languages or when considering different biases. To address this, we present FairDistillation: a cross-lingual method based on knowledge distillation to construct smaller language models while controlling for specific biases. We found that our distillation method does not negatively affect the downstream performance on most tasks and successfully mitigates stereotyping and representational harms. We demonstrate that FairDistillation can create fairer language models at a considerably lower cost than alternative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsKnowledge Distillation
