Towards Two-Stage Counterfactual Learning to Rank

Shashank Gupta; Yiming Liao; and Maarten de Rijke

arXiv:2506.20854·cs.IR·January 8, 2026

Towards Two-Stage Counterfactual Learning to Rank

Shashank Gupta, Yiming Liao, and Maarten de Rijke

PDF

TL;DR

This paper introduces a novel two-stage counterfactual learning to rank method that jointly trains candidate generators and rankers, addressing scalability issues in large document sets and improving ranking performance.

Contribution

It proposes the first joint CLTR estimator and learning method for two-stage ranking systems, considering interactions between candidate generation and ranking.

Findings

01

Joint CLTR method outperforms baselines in semi-synthetic benchmarks.

02

Effective for large-scale document ranking systems.

03

Addresses bias correction in two-stage ranking pipelines.

Abstract

Counterfactual learning to rank (CLTR) aims to learn a ranking policy from user interactions while correcting for the inherent biases in interaction data, such as position bias. Existing CLTR methods assume a single ranking policy that selects top-K ranking from the entire document candidate set. In real-world applications, the candidate document set is on the order of millions, making a single-stage ranking policy impractical. In order to scale to millions of documents, real-world ranking systems are designed in a two-stage fashion, with a candidate generator followed by a ranker. The existing CLTR method for a two-stage offline ranking system only considers the top-1 ranking set-up and only focuses on training the candidate generator, with the ranker fixed. A CLTR method for training both the ranker and candidate generator jointly is missing from the existing literature. In this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training