Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

Peitian Zhang; Zheng Liu; Shitao Xiao; Zhicheng Dou; Jing Yao

arXiv:2210.05521·cs.IR·October 18, 2023·1 cites

Hybrid Inverted Index Is a Robust Accelerator for Dense Retrieval

Peitian Zhang, Zheng Liu, Shitao Xiao, Zhicheng Dou, Jing Yao

PDF

Open Access 1 Repo

TL;DR

The paper introduces the Hybrid Inverted Index (HI$^2$), a novel approach combining embedding clustering and salient term matching to enhance dense retrieval speed and accuracy without quality loss.

Contribution

It proposes a collaborative framework using embedding clusters and salient terms, with learned selectors, to improve dense retrieval efficiency and effectiveness.

Findings

01

Achieves lossless retrieval quality with competitive efficiency

02

Effectively combines clustering and lexical matching for dense retrieval

03

Demonstrates superior performance on popular benchmarks

Abstract

Inverted file structure is a common technique for accelerating dense retrieval. It clusters documents based on their embeddings; during searching, it probes nearby clusters w.r.t. an input query and only evaluates documents within them by subsequent codecs, thus avoiding the expensive cost of exhaustive traversal. However, the clustering is always lossy, which results in the miss of relevant documents in the probed clusters and hence degrades retrieval quality. In contrast, lexical matching, such as overlaps of salient terms, tends to be strong feature for identifying relevant documents. In this work, we present the Hybrid Inverted Index (HI $^{2}$ ), where the embedding clusters and salient terms work collaboratively to accelerate dense retrieval. To make best of both effectiveness and efficiency, we devise a cluster selector and a term selector, to construct compact inverted lists and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

namespace-pt/adon
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques