What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary
Ori Ram, Liat Bezalel, Adi Zicher, Yonatan Belinkov, Jonathan Berant,, Amir Globerson

TL;DR
This paper investigates how dual encoders for dense retrieval represent text by projecting their vectors into vocabulary space, revealing semantic insights and proposing enhancements that improve zero-shot retrieval performance.
Contribution
It introduces a novel interpretation of dual encoder representations as distributions over vocabulary, connecting dense and sparse retrieval, and proposes a method to enrich representations with lexical info.
Findings
Projections contain rich semantic information.
Enriching representations improves zero-shot retrieval performance.
Model struggles with tail entities due to token forgetting.
Abstract
Dual encoders are now the dominant architecture for dense retrieval. Yet, we have little understanding of how they represent text, and why this leads to good performance. In this work, we shed light on this question via distributions over the vocabulary. We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval. We find that this view can offer an explanation for some of the failure cases of dense retrievers. For example, we observe that the inability of models to handle tail entities is correlated with a tendency of the token distributions to forget some of the tokens of those entities. We leverage this insight and propose a simple way to enrich query and passage representations with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
