Retrieval as Attention: End-to-end Learning of Retrieval and Reading   within a Single Transformer

Zhengbao Jiang; Luyu Gao; Jun Araki; Haibo Ding; Zhiruo Wang; Jamie; Callan; Graham Neubig

arXiv:2212.02027·cs.CL·December 6, 2022·1 cites

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie, Callan, Graham Neubig

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper introduces ReAtt, a single Transformer model that performs retrieval and reading for question answering tasks end-to-end, simplifying training and improving adaptability across domains.

Contribution

It proposes a novel end-to-end training approach for retrieval and reading within one Transformer, eliminating the need for separate models and training stages.

Findings

01

Achieves competitive retrieval and QA performance with a single model.

02

Outperforms or matches state-of-the-art methods trained separately.

03

Significantly improves out-of-domain performance in supervised and unsupervised settings.

Abstract

Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jzbjyb/reatt
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Linear Layer · Dense Connections · Adam