Isotropic Representation Can Improve Dense Retrieval
Euna Jung, Jungwon Park, Jaekeol Choi, Sungyoon Kim, Wonjong Rhee

TL;DR
This paper demonstrates that making BERT-based dense retrieval representations isotropic through unsupervised post-processing improves relevance scoring, robustness, and out-of-distribution performance.
Contribution
It introduces novel unsupervised post-processing methods to achieve isotropic representations in dense retrieval models, enhancing their effectiveness and robustness.
Findings
Performance improvements in document re-ranking (up to 22.81%)
Enhanced robustness in out-of-distribution tasks (up to 24.98% improvement)
Isotropic representations outperform anisotropic ones in various settings
Abstract
The recent advancement in language representation modeling has broadly affected the design of dense retrieval models. In particular, many of the high-performing dense retrieval models evaluate representations of query and document using BERT, and subsequently apply a cosine-similarity based scoring to determine the relevance. BERT representations, however, are known to follow an anisotropic distribution of a narrow cone shape and such an anisotropic distribution can be undesirable for the cosine-similarity based scoring. In this work, we first show that BERT-based DR also follows an anisotropic distribution. To cope with the problem, we introduce unsupervised post-processing methods of Normalizing Flow and whitening, and develop token-wise method in addition to the sequence-wise method for applying the post-processing methods to the representations of dense retrieval models. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Test · Linear Layer · WordPiece · Layer Normalization · Softmax · Linear Warmup With Linear Decay · Adam · Multi-Head Attention · Dense Connections
