TL;DR
This paper introduces a method for adapting paragraph-level question answering models to handle entire documents by training them to produce calibrated confidence scores, resulting in significant performance improvements on document QA datasets.
Contribution
The paper proposes a shared-normalization training objective combined with a pipeline for document QA, enabling models to effectively answer questions on full documents.
Findings
Achieved 71.3 F1 on TriviaQA web dataset
Significant improvement over previous best system (56.7 F1)
Demonstrated strong performance across multiple datasets
Abstract
We consider the problem of adapting neural paragraph-level question answering models to the case where entire documents are given as input. Our proposed solution trains models to produce well calibrated confidence scores for their results on individual paragraphs. We sample multiple paragraphs from the documents during training, and use a shared-normalization training objective that encourages the model to produce globally correct output. We combine this method with a state-of-the-art pipeline for training models on document QA data. Experiments demonstrate strong performance on several document QA datasets. Overall, we are able to achieve a score of 71.3 F1 on the web portion of TriviaQA, a large improvement from the 56.7 F1 of the previous best system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
