Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval
Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, Jing Yao

TL;DR
The paper introduces the Hybrid Inverted Index (HI$^2$), a novel approach combining embedding clustering and salient term matching to enhance dense retrieval speed and accuracy without quality loss.
Contribution
It proposes a collaborative framework using embedding clusters and salient terms, with learned selectors, to improve dense retrieval efficiency and effectiveness.
Findings
Achieves lossless retrieval quality with competitive efficiency
Effectively combines clustering and lexical matching for dense retrieval
Demonstrates superior performance on popular benchmarks
Abstract
Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by subsequent codecs, thus avoiding the expensive cost of exhaustive traversal. However, the clustering is always lossy, which results in the miss of relevant documents in the probed clusters and hence degrades retrieval quality. In contrast, lexical matching, such as overlaps of salient terms, tends to be strong feature for identifying relevant documents. In this work, we present the Hybrid Inverted Index (HI), where the embedding clusters and salient terms work collaboratively to accelerate dense retrieval. To make best of both effectiveness and efficiency, we devise a cluster selector and a term selector, to construct compact inverted lists and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques
