Semantic Models for the First-stage Retrieval: A Comprehensive Review

Jiafeng Guo; Yinqiong Cai; Yixing Fan; Fei Sun; Ruqing Zhang; and; Xueqi Cheng

arXiv:2103.04831·cs.IR·September 20, 2021

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, and, Xueqi Cheng

PDF

1 Repo

TL;DR

This paper reviews the development of semantic models for first-stage retrieval in search systems, highlighting their evolution from classical methods to neural approaches and discussing future challenges and directions.

Contribution

It provides a comprehensive survey of first-stage semantic retrieval models, unifying various approaches and analyzing their connections and differences.

Findings

01

Semantic models improve recall in first-stage retrieval.

02

Neural semantic retrieval methods are rapidly evolving.

03

Open challenges include efficiency and robustness of models.

Abstract

Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents, and latter stages attempt to re-rank those candidates. Unlike re-ranking stages going through quick technique shifts during past decades, the first-stage retrieval has long been dominated by classical term-based models. Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning. Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently. Recently, we have witnessed an explosive growth of research interests on the first-stage semantic retrieval models. We believe it is the right time to survey current status, learn from existing methods, and gain some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caiyinqiong/Semantic-Retrieval-Models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.