Query Embedding Pruning for Dense Retrieval

Nicola Tonellotto; Craig Macdonald

arXiv:2108.10341·cs.IR·August 25, 2021

Query Embedding Pruning for Dense Retrieval

Nicola Tonellotto, Craig Macdonald

PDF

1 Repo

TL;DR

This paper introduces a method to prune query embeddings in dense retrieval systems like ColBERT, significantly improving efficiency by reducing retrieval costs and response times without sacrificing effectiveness.

Contribution

It is the first to propose query embedding pruning for dense retrieval, demonstrating substantial speedups and reduced document retrieval with maintained accuracy.

Findings

01

70% reduction in documents retrieved

02

2.65x speedup in response time

03

no significant loss in retrieval effectiveness

Abstract

Recent advances in dense retrieval techniques have offered the promise of being able not just to re-rank documents using contextualised language models such as BERT, but also to use such models to identify documents from the collection in the first place. However, when using dense retrieval approaches that use multiple embedded representations for each query, a large number of documents can be retrieved for each query, hindering the efficiency of the method. Hence, this work is the first to consider efficiency improvements in the context of a dense retrieval approach (namely ColBERT), by pruning query term embeddings that are estimated not to be useful for retrieving relevant documents. Our proposed query embeddings pruning reduces the cost of the dense retrieval operation, as well as reducing the number of documents that are retrieved and hence require to be fully scored. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

terrierteam/pyterrier_colbert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Pruning · Linear Layer · WordPiece · Layer Normalization · Adam · Residual Connection · Weight Decay