Understanding the Effects of the Baidu-ULTR Logging Policy on Two-Tower Models
Morris de Haan, Philipp Hager

TL;DR
This paper investigates the impact of logging policy confounding on two-tower ULTR models using real-world Baidu-ULTR data, revealing minimal effects and highlighting discrepancies between expert annotations and user clicks.
Contribution
It provides the first real-world analysis of confounding effects on two-tower ULTR models, showing limited impact and identifying annotation-user behavior mismatches.
Findings
Confounding conditions exist in Baidu-ULTR data.
Confounding has negligible effect on model performance.
Mismatch identified between expert labels and user clicks.
Abstract
Despite the popularity of the two-tower model for unbiased learning to rank (ULTR) tasks, recent work suggests that it suffers from a major limitation that could lead to its collapse in industry applications: the problem of logging policy confounding. Several potential solutions have even been proposed; however, the evaluation of these methods was mostly conducted using semi-synthetic simulation experiments. This paper bridges the gap between theory and practice by investigating the confounding problem on the largest real-world dataset, Baidu-ULTR. Our main contributions are threefold: 1) we show that the conditions for the confounding problem are given on Baidu-ULTR, 2) the confounding problem bears no significant effect on the two-tower model, and 3) we point to a potential mismatch between expert annotations, the golden standard in ULTR, and user click behavior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Systems and Technologies
