Autoregressive Search Engines: Generating Substrings as Document Identifiers
Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen-tau Yih,, Sebastian Riedel, Fabio Petroni

TL;DR
This paper introduces a novel autoregressive retrieval method that generates passage identifiers using ngrams, outperforming previous approaches and establishing new state-of-the-art results with lower memory usage.
Contribution
It proposes a structure-free ngram-based autoregressive retrieval approach that improves accuracy and efficiency over prior hierarchical methods.
Findings
Outperforms prior autoregressive retrieval methods.
Achieves at least 10-point improvement on KILT benchmark.
Uses less memory than competing systems.
Abstract
Knowledge-intensive language tasks require NLP systems to both provide the correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive language models are emerging as the de-facto standard for generating answers, with newer and more powerful systems emerging at an astonishing pace. In this paper we argue that all this (and future) progress can be directly applied to the retrieval problem with minimal intervention to the models' architecture. Previous work has explored ways to partition the search space into hierarchical structures and retrieve documents by autoregressively generating their unique identifier. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers. This setup allows us to use an autoregressive model to generate and score distinctive ngrams, that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
