Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB

Xingyu Ren; Youran Sun; Haoyu Liang

arXiv:2511.11041·cs.CL·November 17, 2025

Correcting Mean Bias in Text Embeddings: A Refined Renormalization with Training-Free Improvements on MMTEB

Xingyu Ren, Youran Sun, Haoyu Liang

PDF

Open Access

TL;DR

This paper introduces a simple, training-free renormalization method to correct mean bias in text embeddings, significantly enhancing their performance across multiple benchmarks and tasks.

Contribution

The paper proposes a lightweight, training-free renormalization technique to correct mean bias in text embeddings, improving performance on the MMTEB benchmark.

Findings

01

Renormalization improves retrieval performance by 9.7 σ

02

It enhances classification tasks by 3.1 σ

03

The projection-based variant performs better as predicted

Abstract

We find that current text embedding models produce outputs with a consistent bias, i.e., each embedding vector $e$ can be decomposed as $\tilde{e} + μ$ , where $μ$ is almost identical across all sentences. We propose a plug-and-play, training-free and lightweight solution called Renormalization. Through extensive experiments, we show that renormalization consistently and statistically significantly improves the performance of existing models on the Massive Multilingual Text Embedding Benchmark (MMTEB). In particular, across 38 models, renormalization improves performance by 9.7 $σ$ on retrieval tasks, 3.1 $σ$ on classification tasks, and 0.8 $σ$ on other types of tasks. Renormalization has two variants: directly subtracting $μ$ from $e$ , or subtracting the projection of $e$ onto $μ$ . We theoretically predict that the latter performs better, and our experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Text and Document Classification Technologies