TL;DR
This paper introduces CounterSample, a new algorithm for counterfactual learning to rank that converges faster than traditional IPS-weighted gradient methods, especially in high-variance scenarios, supported by theoretical and empirical evidence.
Contribution
The paper proposes CounterSample, a novel algorithm that improves convergence speed in counterfactual learning to rank by reducing variance in IPS-weighted gradients.
Findings
CounterSample converges faster than standard methods.
Empirical results show improved performance across various biased scenarios.
Theoretical analysis confirms better convergence rates.
Abstract
Counterfactual Learning to Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights. To overcome this limitation, we propose a novel learning algorithm, called…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
