You Only Need One Model for Open-domain Question Answering

Haejun Lee; Akhil Kedia; Jongwon Lee; Ashwin Paranjape; Christopher D.; Manning; and Kyoung-Gu Woo

arXiv:2112.07381·cs.CL·October 31, 2022

You Only Need One Model for Open-domain Question Answering

Haejun Lee, Akhil Kedia, Jongwon Lee, Ashwin Paranjape, Christopher D., Manning, and Kyoung-Gu Woo

PDF

TL;DR

This paper introduces a unified transformer-based model for open-domain question answering that integrates retrieval, reranking, and reading into a single end-to-end trainable system, improving efficiency and performance.

Contribution

The authors propose a novel architecture that combines retrieval, reranking, and reading within one transformer model, trained end-to-end, replacing multiple separate models.

Findings

01

Outperforms previous state-of-the-art models on Natural Questions and TriviaQA datasets.

02

Achieves better gradient flow and more efficient use of model capacity.

03

Improves exact match scores by 1.0 and 0.7 points respectively.

Abstract

Recent approaches to Open-domain Question Answering refer to an external knowledge base using a retriever model, optionally rerank passages with a separate reranker model and generate an answer using another reader model. Despite performing related tasks, the models have separate parameters and are weakly-coupled during training. We propose casting the retriever and the reranker as internal passage-wise attention mechanisms applied sequentially within the transformer architecture and feeding computed representations to the reader, with the hidden representations progressively refined at each stage. This allows us to use a single question answering model trained end-to-end, which is a more efficient use of model capacity and also leads to better gradient flow. We present a pre-training method to effectively train this architecture and evaluate our model on the Natural Questions and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection