Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering
Tavish McDonald, Brian Tsan, Amar Saini, Juanita Ordonez, Luis, Gutierrez, Phan Nguyen, Blake Mason, Brenda Ng

TL;DR
This paper introduces a three-stage framework for zero-shot document-level question answering that extracts, retrieves, and comprehends information from scholarly PDFs, significantly improving answer accuracy over existing methods.
Contribution
The paper proposes a novel detect-retrieve-comprehend framework that effectively handles long, ill-formatted documents for scientific QA without requiring labeled training data.
Findings
+7.19 improvement in Answer-F1 over baselines
Superior context selection in document QA
Effective handling of long, complex documents
Abstract
Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
