Confidence-Calibrated Ensemble Dense Phrase Retrieval

William Yang; Noah Bergam; Arnav Jain; Nima Sheikhoslami

arXiv:2306.15917·cs.CL·June 29, 2023

Confidence-Calibrated Ensemble Dense Phrase Retrieval

William Yang, Noah Bergam, Arnav Jain, Nima Sheikhoslami

PDF

Open Access

TL;DR

This paper enhances Dense Passage Retrieval by applying confidence-calibrated ensemble methods across multiple phrase lengths, achieving state-of-the-art results without additional pre-training.

Contribution

It introduces a novel ensemble approach over various phrase segmentations to optimize DPR performance across different datasets.

Findings

01

Achieves state-of-the-art results on Google NQ and SQuAD datasets.

02

Different phrase granularities are optimal for different domains.

03

Ensemble over multiple segmentations improves retrieval accuracy.

Abstract

In this paper, we consider the extent to which the transformer-based Dense Passage Retrieval (DPR) algorithm, developed by (Karpukhin et. al. 2020), can be optimized without further pre-training. Our method involves two particular insights: we apply the DPR context encoder at various phrase lengths (e.g. one-sentence versus five-sentence segments), and we take a confidence-calibrated ensemble prediction over all of these different segmentations. This somewhat exhaustive approach achieves start-of-the-art results on benchmark datasets such as Google NQ and SQuAD. We also apply our method to domain-specific datasets, and the results suggest how different granularities are optimal for different domains

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies