Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search
Xiaojie Sun, Lulu Yu, Yiting Wang, Keping Bi, Jiafeng Guo

TL;DR
This paper presents an ensemble ranking model that integrates multiple pretraining strategies and bias mitigation techniques to improve web search relevance, achieving top-tier results in a competitive benchmark.
Contribution
The study introduces a novel combination of bias reduction, heuristic features, and ensemble methods for large-scale pre-trained models in web search ranking.
Findings
Achieved 3rd place in WSDM Cup 2023 pre-training task.
Model outperforms the 4th-ranked team by 22.6%.
Effective use of bias mitigation and ensemble techniques.
Abstract
An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank models based on multiple features. However, how to effectively use such methods on large-scale pre-trained models with real-world click data is unknown. To alleviate the data bias in the real world, we incorporate heuristic-based features, refine the ranking objective, add random negatives, and calibrate the propensity calculation in the pre-training stage. Then we fine-tune several pre-trained models and train an ensemble model to aggregate all the predictions from various pre-trained models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Text and Document Classification Technologies · Image Retrieval and Classification Techniques
