To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

Hang Li; Shuai Wang; Shengyao Zhuang; Ahmed Mourad; Xueguang Ma; Jimmy; Lin; Guido Zuccon

arXiv:2205.00235·cs.IR·May 3, 2022

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy, Lin, Guido Zuccon

PDF

TL;DR

This paper investigates how combining sparse and dense retrieval signals, especially through interpolation before and after pseudo relevance feedback, improves retrieval effectiveness across various models and datasets.

Contribution

It provides a comprehensive empirical evaluation of different interpolation strategies and sparse representations in the context of neural PRF for information retrieval.

Findings

01

Interpolation before and after PRF yields the best effectiveness.

02

Both zero-shot and learned sparse representations benefit from combined interpolation.

03

Effectiveness improvements are consistent across multiple datasets and retrieval models.

Abstract

Current pre-trained language model approaches to information retrieval can be broadly divided into two categories: sparse retrievers (to which belong also non-neural approaches such as bag-of-words methods, e.g., BM25) and dense retrievers. Each of these categories appears to capture different characteristics of relevance. Previous work has investigated how relevance signals from sparse retrievers could be combined with those from dense retrievers via interpolation. Such interpolation would generally lead to higher retrieval effectiveness. In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF). This context poses two key challenges: (1) When should interpolation occur: before, after, or both before and after the PRF process? (2) Which sparse representation should be considered: a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.