Simple Applications of BERT for Ad Hoc Document Retrieval

Wei Yang; Haotian Zhang; and Jimmy Lin

arXiv:1903.10972·cs.IR·March 27, 2019·133 cites

Simple Applications of BERT for Ad Hoc Document Retrieval

Wei Yang, Haotian Zhang, and Jimmy Lin

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that applying BERT to individual sentences and aggregating their scores is an effective and simple method for ad hoc document retrieval, outperforming previous neural approaches on TREC datasets.

Contribution

The paper introduces a straightforward method of using BERT for document retrieval by scoring sentences individually and aggregating results, addressing input length limitations.

Findings

01

Achieved highest average precision on TREC microblog and newswire datasets.

02

Simple sentence-level BERT application outperforms more complex neural models.

03

Method is effective despite BERT's input length constraints.

Abstract

Following recent successes in applying BERT to question answering, we explore simple applications to ad hoc document retrieval. This required confronting the challenge posed by documents that are typically longer than the length of input BERT was designed to handle. We address this issue by applying inference on sentences individually, and then aggregating sentence scores to produce document scores. Experiments on TREC microblog and newswire test collections show that our approach is simple yet effective, as we report the highest average precision on these datasets by neural approaches that we are aware of.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Expert finding and Q&A systems

MethodsLinear Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece · Softmax