TL;DR
This paper introduces a comprehensive, adaptable framework for debiasing distributional word vectors, capable of addressing various biases explicitly and implicitly across multiple languages without losing semantic information.
Contribution
It proposes a unified debiasing framework with models for explicit and implicit biases, evaluated with a new comprehensive metric suite, applicable across languages and embedding methods.
Findings
Debiasing models effectively remove biases without harming semantic content.
Framework is robust and applicable to multiple languages and embedding types.
Cross-lingual transfer of debiasing models reduces biases in low-resource languages.
Abstract
Distributional word vectors have recently been shown to encode many of the human biases, most notably gender and racial biases, and models for attenuating such biases have consequently been proposed. However, existing models and studies (1) operate on under-specified and mutually differing bias definitions, (2) are tailored for a particular bias (e.g., gender bias) and (3) have been evaluated inconsistently and non-rigorously. In this work, we introduce a general framework for debiasing word embeddings. We operationalize the definition of a bias by discerning two types of bias specification: explicit and implicit. We then propose three debiasing models that operate on explicit or implicit bias specifications and that can be composed towards more robust debiasing. Finally, we devise a full-fledged evaluation framework in which we couple existing bias metrics with newly proposed ones.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
