An Analysis of Fusion Functions for Hybrid Retrieval

Sebastian Bruch; Siyu Gai; Amir Ingber

arXiv:2210.11934·cs.IR·May 26, 2023·6 cites

An Analysis of Fusion Functions for Hybrid Retrieval

Sebastian Bruch, Siyu Gai, Amir Ingber

PDF

Open Access

TL;DR

This paper compares fusion methods for hybrid text retrieval, showing that convex combination (CC) generally outperforms Reciprocal Rank Fusion (RRF) and is more sample-efficient and robust across domains.

Contribution

It provides a comprehensive analysis of fusion techniques, highlighting the advantages of CC over RRF in various settings and emphasizing its parameter tuning efficiency.

Findings

01

CC outperforms RRF in in-domain and out-of-domain scenarios

02

RRF is sensitive to its parameters

03

CC requires only a small training set for parameter tuning

Abstract

We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a CC fusion is generally agnostic to the choice of score normalization; that CC outperforms RRF in in-domain and out-of-domain settings; and finally, that CC is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Information Retrieval and Search Behavior · Text and Document Classification Technologies