Pre-trained Language Model based Ranking in Baidu Search
Lixin Zou, Shengqiang Zhang, Hengyi Cai, Dehong Ma, Suqi Cheng,, Daiting Shi, Zhifan Zhu, Weiyue Su, Shuaiqiang Wang, Zhicong Cheng, Dawei Yin

TL;DR
This paper presents techniques for deploying pre-trained language models like ERNIE in Baidu's search engine, addressing computational, relevance modeling, and system compatibility challenges to improve online ranking effectiveness.
Contribution
The paper introduces cost-efficient document summarization, relevance-oriented pre-training with noisy data, and human-anchored fine-tuning strategies for PLM-based ranking in large-scale search systems.
Findings
Significant offline and online performance improvements.
Effective summarization and contextualization of web documents.
Enhanced stability of ranking signals across components.
Abstract
As the heart of a search engine, the ranking system plays a crucial role in satisfying users' information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues:(1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web-document, prohibit their deployments in an online ranking system that demands extremely low latency;(2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system;(3) a real-world search engine typically involves a committee of ranking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsERNIE
