Understanding the Behaviors of BERT in Ranking
Yifan Qiao, Chenyan Xiong, Zhenghao Liu, Zhiyuan Liu

TL;DR
This paper investigates BERT's performance and behaviors in ranking tasks, highlighting its strengths in question-answering and its limitations in ad hoc document ranking, with detailed analysis of its attention mechanisms.
Contribution
It provides a comprehensive analysis of BERT's behaviors in ranking, including how it allocates attention and matches semantics, and compares pre-training effects with ranking needs.
Findings
BERT is highly effective in question-answering passage ranking.
BERT's attention focuses on semantic matches and paraphrases.
Gaps exist between BERT's pre-training and ad hoc ranking requirements.
Abstract
This paper studies the performances and behaviors of BERT in ranking tasks. We explore several different ways to leverage the pre-trained BERT and fine-tune it on two ranking tasks: MS MARCO passage reranking and TREC Web Track ad hoc document ranking. Experimental results on MS MARCO demonstrate the strong effectiveness of BERT in question-answering focused passage ranking tasks, as well as the fact that BERT is a strong interaction-based seq2seq matching model. Experimental results on TREC show the gaps between the BERT pre-trained on surrounding contexts and the needs of ad hoc document ranking. Analyses illustrate how BERT allocates its attentions between query-document tokens in its Transformer layers, how it prefers semantic matches between paraphrase tokens, and how that differs with the soft match patterns learned by a click-trained neural ranker.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Information Retrieval and Search Behavior
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections
