RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

Jingtao Zhan; Jiaxin Mao; Yiqun Liu; Min Zhang; Shaoping Ma

arXiv:2006.15498·cs.IR·July 21, 2020·58 cites

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma

PDF

Open Access 3 Repos

TL;DR

RepBERT introduces a novel method for first-stage retrieval using fixed-length contextualized embeddings, achieving state-of-the-art results with efficiency comparable to traditional bag-of-words approaches.

Contribution

It proposes RepBERT, a new approach that uses contextualized embeddings for retrieval, differing from exact term matching methods.

Findings

01

Achieves state-of-the-art results on MS MARCO Passage Ranking

02

Maintains efficiency comparable to bag-of-words methods

03

Demonstrates effectiveness of contextualized embeddings in retrieval

Abstract

Although exact term match between queries and documents is the dominant method to perform first-stage retrieval, we propose a different approach, called RepBERT, to represent documents and queries with fixed-length contextualized embeddings. The inner products of query and document embeddings are regarded as relevance scores. On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency is comparable to bag-of-words methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning