On Single and Multiple Representations in Dense Passage Retrieval

Craig Macdonald; Nicola Tonellotto; Iadh Ounis

arXiv:2108.06279·cs.IR·August 20, 2021·5 cites

On Single and Multiple Representations in Dense Passage Retrieval

Craig Macdonald, Nicola Tonellotto, Iadh Ounis

PDF

Open Access 1 Repo

TL;DR

This paper compares single and multiple dense passage retrieval methods, finding that multiple representations generally outperform single ones in effectiveness, especially for complex or difficult queries, despite being less efficient.

Contribution

It provides a direct comparison of single and multiple dense retrieval methods, highlighting their relative strengths and weaknesses across different query types.

Findings

01

Multiple representations outperform single representations in MAP and MRR@10.

02

Multiple representations are more effective for complex, definitional, and difficult queries.

03

Single representations like ANCE are more efficient in response time and memory usage.

Abstract

The advent of contextualised language models has brought gains in search effectiveness, not just when applied for re-ranking the output of classical weighting models such as BM25, but also when used directly for passage indexing and retrieval, a technique which is called dense retrieval. In the existing literature in neural ranking, two dense retrieval families have become apparent: single representation, where entire passages are represented by a single embedding (usually BERT's [CLS] token, as exemplified by the recent ANCE approach), or multiple representations, where each token in a passage is represented by its own embedding (as exemplified by the recent ColBERT approach). These two families have not been directly compared. However, because of the likely importance of dense retrieval moving forward, a clear understanding of their advantages and disadvantages is paramount. To this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

terrierteam/pyterrier_colbert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Web Data Mining and Analysis