Mitigating False-Negative Contexts in Multi-document Question Answering   with Retrieval Marginalization

Ansong Ni; Matt Gardner; Pradeep Dasigi

arXiv:2103.12235·cs.CL·September 10, 2021·1 cites

Mitigating False-Negative Contexts in Multi-document Question Answering with Retrieval Marginalization

Ansong Ni, Matt Gardner, Pradeep Dasigi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a retrieval marginalization technique for multi-document QA that improves performance by handling unanswerable questions and mitigating false negatives in evidence annotations, achieving state-of-the-art results.

Contribution

It proposes a novel set-valued retrieval parameterization with marginalization during training to address false negatives and unanswerable questions in multi-document QA.

Findings

01

Improves IIRC F1 by 5.5 points to 50.5.

02

Achieves 4.1 F1 gain on HotpotQA with fullwiki retrieval.

03

Outperforms baseline models on two multi-document QA datasets.

Abstract

Question Answering (QA) tasks requiring information from multiple documents often rely on a retrieval model to identify relevant information for reasoning. The retrieval model is typically trained to maximize the likelihood of the labeled supporting evidence. However, when retrieving from large text corpora such as Wikipedia, the correct answer can often be obtained from multiple evidence candidates. Moreover, not all such candidates are labeled as positive during annotation, rendering the training signal weak and noisy. This problem is exacerbated when the questions are unanswerable or when the answers are Boolean, since the model cannot rely on lexical overlap to make a connection between the answer and supporting evidence. We develop a new parameterization of set-valued retrieval that handles unanswerable queries, and we show that marginalizing over this set during training allows a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

niansong1996/retrieval_marginalization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications