End-to-End Retrieval in Continuous Space

Daniel Gillick; Alessandro Presta; Gaurav Singh Tomar

arXiv:1811.08008·cs.IR·November 21, 2018·71 cites

End-to-End Retrieval in Continuous Space

Daniel Gillick, Alessandro Presta, Gaurav Singh Tomar

PDF

Open Access 1 Models

TL;DR

This paper explores end-to-end continuous space retrieval using learned embeddings and approximate nearest neighbor search, achieving significant improvements over discrete methods in question retrieval tasks.

Contribution

It introduces a method for end-to-end continuous retrieval with learned models and demonstrates improved performance over traditional discrete indexing methods.

Findings

01

8% MAP improvement on one task

02

26% MAP improvement on another

03

Modified datasets for better evaluation

Abstract

Most text-based information retrieval (IR) systems index objects by words or phrases. These discrete systems have been augmented by models that use embeddings to measure similarity in continuous space. But continuous-space models are typically used just to re-rank the top candidates. We consider the problem of end-to-end continuous retrieval, where standard approximate nearest neighbor (ANN) search replaces the usual discrete inverted index, and rely entirely on distances between learned embeddings. By training simple models specifically for retrieval, with an appropriate model architecture, we improve on a discrete baseline by 8% and 26% (MAP) on two similar-question retrieval tasks. We also discuss the problem of evaluation for retrieval systems, and show how to modify existing pairwise similarity datasets for this purpose.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
GlassLewis/roberta-large-entity-linking
model· 41 dl· ♡ 3
41 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Algorithms and Data Compression · Advanced Image and Video Retrieval Techniques