Generative Pre-trained Ranking Model with Over-parameterization at   Web-Scale (Extended Abstract)

Yuchen Li; Haoyi Xiong; Linghe Kong; Jiang Bian; Shuaiqiang Wang,; Guihai Chen; Dawei Yin

arXiv:2409.16594·cs.IR·September 26, 2024

Generative Pre-trained Ranking Model with Over-parameterization at Web-Scale (Extended Abstract)

Yuchen Li, Haoyi Xiong, Linghe Kong, Jiang Bian, Shuaiqiang Wang,, Guihai Chen, Dawei Yin

PDF

Open Access

TL;DR

This paper introduces GS2P, a generative semi-supervised pre-trained ranking model that addresses data scarcity and overfitting in web search ranking, demonstrating significant real-world improvements.

Contribution

The paper presents a novel GS2P model that combines generative pre-training with semi-supervised learning to enhance web search ranking performance.

Findings

01

GS2P outperforms existing models on public and real-world datasets.

02

Deployment of GS2P in a large-scale search engine improves search relevance.

03

The model effectively handles diverse query popularities and reduces overfitting.

Abstract

Learning to rank (LTR) is widely employed in web searches to prioritize pertinent webpages from retrieved content based on input queries. However, traditional LTR models encounter two principal obstacles that lead to suboptimal performance: (1) the lack of well-annotated query-webpage pairs with ranking scores covering a diverse range of search query popularities, which hampers their ability to address queries across the popularity spectrum, and (2) inadequately trained models that fail to induce generalized representations for LTR, resulting in overfitting. To address these challenges, we propose a \emph{\uline{G}enerative \uline{S}emi-\uline{S}upervised \uline{P}re-trained} (GS2P) LTR model. We conduct extensive offline experiments on both a publicly available dataset and a real-world dataset collected from a large-scale search engine. Furthermore, we deploy GS2P in a large-scale web…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Bayesian Modeling and Causal Inference · Data Management and Algorithms