Towards Two-Stage Counterfactual Learning to Rank
Shashank Gupta, Yiming Liao, and Maarten de Rijke

TL;DR
This paper introduces a novel two-stage counterfactual learning to rank method that jointly trains candidate generators and rankers, addressing scalability issues in large document sets and improving ranking performance.
Contribution
It proposes the first joint CLTR estimator and learning method for two-stage ranking systems, considering interactions between candidate generation and ranking.
Findings
Joint CLTR method outperforms baselines in semi-synthetic benchmarks.
Effective for large-scale document ranking systems.
Addresses bias correction in two-stage ranking pipelines.
Abstract
Counterfactual learning to rank (CLTR) aims to learn a ranking policy from user interactions while correcting for the inherent biases in interaction data, such as position bias. Existing CLTR methods assume a single ranking policy that selects top-K ranking from the entire document candidate set. In real-world applications, the candidate document set is on the order of millions, making a single-stage ranking policy impractical. In order to scale to millions of documents, real-world ranking systems are designed in a two-stage fashion, with a candidate generator followed by a ranker. The existing CLTR method for a two-stage offline ranking system only considers the top-1 ranking set-up and only focuses on training the candidate generator, with the ranker fixed. A CLTR method for training both the ranker and candidate generator jointly is missing from the existing literature. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
