On Debiasing Text Embeddings Through Context Injection
Thomas Uriot

TL;DR
This paper reviews 19 text embedding models, analyzing their biases and responsiveness to context injection, and introduces a dynamic retrieval algorithm to mitigate bias effects in NLP applications.
Contribution
It provides a comprehensive bias analysis of modern embedding models and proposes a novel context-based debiasing retrieval method.
Findings
Higher performing models capture more biases but incorporate context better.
Models struggle to embed neutral semantics accurately.
Biases in embeddings can cause undesirable outcomes in retrieval tasks.
Abstract
Current advances in Natural Language Processing (NLP) have made it increasingly feasible to build applications leveraging textual data. Generally, the core of these applications rely on having a good semantic representation of text into vectors, via embedding models. However, it has been shown that these embeddings capture and perpetuate biases already present in text. While a few techniques have been proposed to debias embeddings, they do not take advantage of the recent advances in context understanding of modern embedding models. In this paper, we fill this gap by conducting a review of 19 embedding models by quantifying their biases and how well they respond to context injection as a mean of debiasing. We show that higher performing models are more prone to capturing biases, but are also better at incorporating context. Surprisingly, we find that while models can easily embed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
