Why Mean Pooling Works: Quantifying Second-Order Collapse in Text Embeddings
Tomomasa Hara, Hiroto Kurita, Masaaki Imaizumi, Kentaro Inui, Sho Yokoi

TL;DR
This paper investigates the effectiveness of mean pooling in text embeddings, introducing a metric to quantify second-order information collapse and analyzing its impact on model robustness and performance.
Contribution
It proposes a simple metric to measure second-order collapse in mean pooling and empirically shows modern encoders are robust to this collapse, explaining their effectiveness.
Findings
Modern text encoders are generally robust to second-order collapse.
Contrastive fine-tuned encoders are less prone to collapse than pretrained models.
Robustness to collapse correlates with better downstream task performance.
Abstract
For constructing text embeddings, mean pooling, which averages token embeddings, is the standard approach. This paper examines whether mean pooling actually works well in real models. First, we note that mean pooling can collapse information beyond the first-order statistics of the token embeddings, such as second-order statistics that capture their spatial structure, potentially mapping distinct token embedding distributions to similar text embeddings. Motivated by this concern, we propose a simple metric to quantify such a collapse induced by mean pooling. Then, using this metric, we empirically measure how often this collapse occurs in actual models and texts, and find that modern text encoders are robust to this collapse. In particular, contrastive fine-tuned text encoders tend to be less prone to the collapse than their pretrained backbone models. We also find that the robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
