Dictionary-based Debiasing of Pre-trained Word Embeddings

Masahiro Kaneko; Danushka Bollegala

arXiv:2101.09525·cs.CL·January 26, 2021·5 cites

Dictionary-based Debiasing of Pre-trained Word Embeddings

Masahiro Kaneko, Danushka Bollegala

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dictionary-based method for debiasing pre-trained word embeddings that automatically learns constraints from dictionary definitions, effectively reducing biases without needing original training data or bias-specific word lists.

Contribution

It presents a novel debiasing approach that leverages dictionaries to automatically learn constraints, avoiding the need for predefined bias types or training resources.

Findings

01

Successfully removes biases from word embeddings

02

Preserves semantic information in embeddings

03

Outperforms existing debiasing methods on benchmarks

Abstract

Word embeddings trained on large corpora have shown to encode high levels of unfair discriminatory gender, racial, religious and ethnic biases. In contrast, human-written dictionaries describe the meanings of words in a concise, objective and an unbiased manner. We propose a method for debiasing pre-trained word embeddings using dictionaries, without requiring access to the original training resources or any knowledge regarding the word embedding algorithms used. Unlike prior work, our proposed method does not require the types of biases to be pre-defined in the form of word lists, and learns the constraints that must be satisfied by unbiased word embeddings automatically from dictionary definitions of the words. Specifically, we learn an encoder to generate a debiased version of an input word embedding such that it (a) retains the semantics of the pre-trained word embeddings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kanekomasahiro/dict-debias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Hate Speech and Cyberbullying Detection