Dictionary-based Debiasing of Pre-trained Word Embeddings
Masahiro Kaneko, Danushka Bollegala

TL;DR
This paper introduces a dictionary-based method for debiasing pre-trained word embeddings that automatically learns constraints from dictionary definitions, effectively reducing biases without needing original training data or bias-specific word lists.
Contribution
It presents a novel debiasing approach that leverages dictionaries to automatically learn constraints, avoiding the need for predefined bias types or training resources.
Findings
Successfully removes biases from word embeddings
Preserves semantic information in embeddings
Outperforms existing debiasing methods on benchmarks
Abstract
Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, without requiring access to the original training resources or any knowledge regarding the word embedding algorithms used. Unlike prior work, our proposed method does not require the types of biases to be pre-defined in the form of word lists, and learns the constraints that must be satisfied by unbiased word embeddings automatically from dictionary definitions of the words. Specifically, we learn an encoder to generate a debiased version of an input word embedding such that it (a) retains the semantics of the pre-trained word embeddings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Hate Speech and Cyberbullying Detection
