Autoregressive Search Engines: Generating Substrings as Document   Identifiers

Michele Bevilacqua; Giuseppe Ottaviano; Patrick Lewis; Wen-tau Yih,; Sebastian Riedel; Fabio Petroni

arXiv:2204.10628·cs.CL·April 25, 2022·66 cites

Autoregressive Search Engines: Generating Substrings as Document Identifiers

Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen-tau Yih,, Sebastian Riedel, Fabio Petroni

PDF

Open Access 3 Repos 1 Video

TL;DR

This paper introduces a novel autoregressive retrieval method that generates passage identifiers using ngrams, outperforming previous approaches and establishing new state-of-the-art results with lower memory usage.

Contribution

It proposes a structure-free ngram-based autoregressive retrieval approach that improves accuracy and efficiency over prior hierarchical methods.

Findings

01

Outperforms prior autoregressive retrieval methods.

02

Achieves at least 10-point improvement on KILT benchmark.

03

Uses less memory than competing systems.

Abstract

Knowledge-intensive language tasks require NLP systems to both provide the correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive language models are emerging as the de-facto standard for generating answers, with newer and more powerful systems emerging at an astonishing pace. In this paper we argue that all this (and future) progress can be directly applied to the retrieval problem with minimal intervention to the models' architecture. Previous work has explored ways to partition the search space into hierarchical structures and retrieve documents by autoregressively generating their unique identifier. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers. This setup allows us to use an autoregressive model to generate and score distinctive ngrams, that are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Autoregressive Search Engines: Generating Substrings as Document Identifiers· slideslive

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications