Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)
Yuchen Li, Haoyi Xiong, Linghe Kong, Jiang Bian, Shuaiqiang Wang,, Guihai Chen, Dawei Yin

TL;DR
This paper introduces GS2P, a generative semi-supervised pre-trained ranking model that addresses data scarcity and overfitting in web search ranking, demonstrating significant real-world improvements.
Contribution
The paper presents a novel GS2P model that combines generative pre-training with semi-supervised learning to enhance web search ranking performance.
Findings
GS2P outperforms existing models on public and real-world datasets.
Deployment of GS2P in a large-scale search engine improves search relevance.
The model effectively handles diverse query popularities and reduces overfitting.
Abstract
Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries. However, traditional LTR models encounter two principal obstacles that lead to suboptimal performance: (1) the lack of well-annotated query-webpage pairs with ranking scores covering a diverse range of search query popularities, which hampers their ability to address queries across the popularity spectrum, and (2) inadequately trained models that fail to induce generalized representations for LTR, resulting in overfitting. To address these challenges, we propose a \emph{\uline{G}enerative \uline{S}emi-\uline{S}upervised \uline{P}re-trained} (GS2P) LTR model. We conduct extensive offline experiments on both a publicly available dataset and a real-world dataset collected from a large-scale search engine. Furthermore, we deploy GS2P in a large-scale web…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Bayesian Modeling and Causal Inference · Data Management and Algorithms
