An Analysis of Fusion Functions for Hybrid Retrieval
Sebastian Bruch, Siyu Gai, Amir Ingber

TL;DR
This paper compares fusion methods for hybrid text retrieval, showing that convex combination (CC) generally outperforms Reciprocal Rank Fusion (RRF) and is more sample-efficient and robust across domains.
Contribution
It provides a comprehensive analysis of fusion techniques, highlighting the advantages of CC over RRF in various settings and emphasizing its parameter tuning efficiency.
Findings
CC outperforms RRF in in-domain and out-of-domain scenarios
RRF is sensitive to its parameters
CC requires only a small training set for parameter tuning
Abstract
We study hybrid search in text retrieval where lexical and semantic search are fused together with the intuition that the two are complementary in how they model relevance. In particular, we examine fusion by a convex combination (CC) of lexical and semantic scores, as well as the Reciprocal Rank Fusion (RRF) method, and identify their advantages and potential pitfalls. Contrary to existing studies, we find RRF to be sensitive to its parameters; that the learning of a CC fusion is generally agnostic to the choice of score normalization; that CC outperforms RRF in in-domain and out-of-domain settings; and finally, that CC is sample efficient, requiring only a small set of training examples to tune its only parameter to a target domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Information Retrieval and Search Behavior · Text and Document Classification Technologies
