Human-in-the-Loop Refinement of Word Embeddings

James Powell; Kari Sentz; Martin Klein

arXiv:2110.02884·cs.CL·October 7, 2021·1 cites

Human-in-the-Loop Refinement of Word Embeddings

James Powell, Kari Sentz, Martin Klein

PDF

Open Access

TL;DR

This paper introduces an interactive, human-in-the-loop system for refining word embeddings to address biases and quality issues, enabling tailored, iterative improvements and better understanding of their impact on machine learning tasks.

Contribution

It proposes a novel interactive refitting approach that allows humans to identify and correct biases in word embeddings iteratively and locally.

Findings

01

Enables fine-grained, organization-specific bias correction.

02

Facilitates iterative refinement of embeddings with human oversight.

03

Improves understanding of embedding impacts on downstream tasks.

Abstract

Word embeddings are a fixed, distributional representation of the context of words in a corpus learned from word co-occurrences. Despite their proven utility in machine learning tasks, word embedding models may capture uneven semantic and syntactic representations, and can inadvertently reflect various kinds of bias present within corpora upon which they were trained. It has been demonstrated that post-processing of word embeddings to apply information found in lexical dictionaries can improve the semantic associations, thus improving their quality. Building on this idea, we propose a system that incorporates an adaptation of word embedding post-processing, which we call "interactive refitting", to address some of the most daunting qualitative problems found in word embeddings. Our approach allows a human to identify and address potential quality issues with word embeddings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling