Confidence-Calibrated Ensemble Dense Phrase Retrieval
William Yang, Noah Bergam, Arnav Jain, Nima Sheikhoslami

TL;DR
This paper enhances Dense Passage Retrieval by applying confidence-calibrated ensemble methods across multiple phrase lengths, achieving state-of-the-art results without additional pre-training.
Contribution
It introduces a novel ensemble approach over various phrase segmentations to optimize DPR performance across different datasets.
Findings
Achieves state-of-the-art results on Google NQ and SQuAD datasets.
Different phrase granularities are optimal for different domains.
Ensemble over multiple segmentations improves retrieval accuracy.
Abstract
In this paper, we consider the extent to which the transformer-based Dense Passage Retrieval (DPR) algorithm, developed by (Karpukhin et. al. 2020), can be optimized without further pre-training. Our method involves two particular insights: we apply the DPR context encoder at various phrase lengths (e.g. one-sentence versus five-sentence segments), and we take a confidence-calibrated ensemble prediction over all of these different segmentations. This somewhat exhaustive approach achieves start-of-the-art results on benchmark datasets such as Google NQ and SQuAD. We also apply our method to domain-specific datasets, and the results suggest how different granularities are optimal for different domains
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
