Tradeoffs in Sentence Selection Techniques for Open-Domain Question Answering
Shih-Ting Lin, Greg Durrett

TL;DR
This paper investigates sentence selection techniques in open-domain QA, comparing QA-based and retrieval-based models, and introduces a hybrid ensemble to optimize speed and accuracy across datasets.
Contribution
It systematically analyzes trade-offs between different sentence selection methods and proposes a hybrid ensemble approach for improved performance and efficiency.
Findings
Retrieval-based models are faster than QA-based models.
Lightweight QA models perform well in sentence selection.
Ensemble methods generalize effectively across domains.
Abstract
Current methods in open-domain question answering (QA) usually employ a pipeline of first retrieving relevant documents, then applying strong reading comprehension (RC) models to that retrieved text. However, modern RC models are complex and expensive to run, so techniques to prune the space of retrieved text are critical to allow this approach to scale. In this paper, we focus on approaches which apply an intermediate sentence selection step to address this issue, and investigate the best practices for this approach. We describe two groups of models for sentence selection: QA-based approaches, which run a full-fledged QA system to identify answer candidates, and retrieval-based models, which find parts of each passage specifically related to each question. We examine trade-offs between processing speed and task performance in these two approaches, and demonstrate an ensemble module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
