Joint Learning of Deep Retrieval Model and Product Quantization based   Embedding Index

Han Zhang; Hongwei Shen; Yiming Qiu; Yunjiang Jiang; Songlin Wang,; Sulong Xu; Yun Xiao; Bo Long; Wen-Yun Yang

arXiv:2105.03933·cs.IR·May 31, 2021

Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

Han Zhang, Hongwei Shen, Yiming Qiu, Yunjiang Jiang, Songlin Wang,, Sulong Xu, Yun Xiao, Bo Long, Wen-Yun Yang

PDF

1 Repo

TL;DR

This paper introduces Poeem, an end-to-end trainable deep retrieval system that combines embedding learning and product quantization, significantly enhancing accuracy and reducing indexing time.

Contribution

The novel Poeem method unifies embedding learning and index building in a single end-to-end training process using innovative techniques.

Findings

01

Improves retrieval accuracy significantly

02

Reduces indexing time to nearly zero

03

Open sourced for reproducibility

Abstract

Embedding index that enables fast approximate nearest neighbor(ANN) search, serves as an indispensable component for state-of-the-art deep retrieval systems. Traditional approaches, often separating the two steps of embedding learning and index building, incur additional indexing time and decayed retrieval accuracy. In this paper, we propose a novel method called Poeem, which stands for product quantization based embedding index jointly trained with deep retrieval model, to unify the two separate steps within an end-to-end training, by utilizing a few techniques including the gradient straight-through estimator, warm start strategy, optimal space decomposition and Givens rotation. Extensive experimental results show that the proposed method not only improves retrieval accuracy significantly but also reduces the indexing time to almost none. We have open sourced our approach for the sake…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jdcomsearch/poeem
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.