On Debiasing Text Embeddings Through Context Injection

Thomas Uriot

arXiv:2410.12874·cs.CL·October 21, 2024

On Debiasing Text Embeddings Through Context Injection

Thomas Uriot

PDF

Open Access

TL;DR

This paper reviews 19 text embedding models, analyzing their biases and responsiveness to context injection, and introduces a dynamic retrieval algorithm to mitigate bias effects in NLP applications.

Contribution

It provides a comprehensive bias analysis of modern embedding models and proposes a novel context-based debiasing retrieval method.

Findings

01

Higher performing models capture more biases but incorporate context better.

02

Models struggle to embed neutral semantics accurately.

03

Biases in embeddings can cause undesirable outcomes in retrieval tasks.

Abstract

Current advances in Natural Language Processing (NLP) have made it increasingly feasible to build applications leveraging textual data. Generally, the core of these applications rely on having a good semantic representation of text into vectors, via embedding models. However, it has been shown that these embeddings capture and perpetuate biases already present in text. While a few techniques have been proposed to debias embeddings, they do not take advantage of the recent advances in context understanding of modern embedding models. In this paper, we fill this gap by conducting a review of 19 embedding models by quantifying their biases and how well they respond to context injection as a mean of debiasing. We show that higher performing models are more prone to capturing biases, but are also better at incorporating context. Surprisingly, we find that while models can easily embed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems