Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB
Xingyu Ren, Youran Sun, Haoyu Liang

TL;DR
This paper introduces a simple, training-free renormalization method to correct mean bias in text embeddings, significantly enhancing their performance across multiple benchmarks and tasks.
Contribution
The paper proposes a lightweight, training-free renormalization technique to correct mean bias in text embeddings, improving performance on the MMTEB benchmark.
Findings
Renormalization improves retrieval performance by 9.7 σ
It enhances classification tasks by 3.1 σ
The projection-based variant performs better as predicted
Abstract
We find that current text embedding models produce outputs with a consistent bias, i.e., each embedding vector can be decomposed as , where is almost identical across all sentences. We propose a plug-and-play, training-free and lightweight solution called Renormalization. Through extensive experiments, we show that renormalization consistently and statistically significantly improves the performance of existing models on the Massive Multilingual Text Embedding Benchmark (MMTEB). In particular, across 38 models, renormalization improves performance by 9.7 on retrieval tasks, 3.1 on classification tasks, and 0.8 on other types of tasks. Renormalization has two variants: directly subtracting from , or subtracting the projection of onto . We theoretically predict that the latter performs better, and our experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Text and Document Classification Technologies
