Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer
Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie, Callan, Graham Neubig

TL;DR
This paper introduces ReAtt, a single Transformer model that performs retrieval and reading for question answering tasks end-to-end, simplifying training and improving adaptability across domains.
Contribution
It proposes a novel end-to-end training approach for retrieval and reading within one Transformer, eliminating the need for separate models and training stages.
Findings
Achieves competitive retrieval and QA performance with a single model.
Outperforms or matches state-of-the-art methods trained separately.
Significantly improves out-of-domain performance in supervised and unsupervised settings.
Abstract
Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Softmax · Dropout · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Linear Layer · Dense Connections · Adam
