IRLI: Iterative Re-partitioning for Learning to Index

Gaurav Gupta; Tharun Medini; Anshumali Shrivastava; Alexander J Smola

arXiv:2103.09944·cs.IR·March 19, 2021

IRLI: Iterative Re-partitioning for Learning to Index

Gaurav Gupta, Tharun Medini, Anshumali Shrivastava, Alexander J Smola

PDF

Open Access

TL;DR

IRLI introduces an iterative, learned partitioning method for efficient, scalable, and accurate neural information retrieval, outperforming existing approaches in speed and precision on large-scale datasets.

Contribution

IRLI presents a novel iterative partitioning approach with a load balancing strategy, improving retrieval accuracy and efficiency in neural indexing.

Findings

01

IRLI achieves 5x faster inference than baseline methods.

02

IRLI surpasses NeuralLSH in recall with fewer candidates.

03

IRLI outperforms FAISS on large-scale vector indexing.

Abstract

Neural models have transformed the fundamental information retrieval problem of mapping a query to a giant set of items. However, the need for efficient and low latency inference forces the community to reconsider efficient approximate near-neighbor search in the item space. To this end, learning to index is gaining much interest in recent times. Methods have to trade between obtaining high accuracy while maintaining load balance and scalability in distributed settings. We propose a novel approach called IRLI (pronounced `early'), which iteratively partitions the items by learning the relevant buckets directly from the query-item relevance data. Furthermore, IRLI employs a superior power-of- $k$ -choices based load balancing strategy. We mathematically show that IRLI retrieves the correct item with high probability under very natural assumptions and provides superior load balancing. IRLI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Text and Document Classification Technologies · Machine Learning and Algorithms