Learning Neural Ranking Models Online from Implicit User Feedback
Yiling Jia, Hongning Wang

TL;DR
This paper introduces an online learning to rank method using neural models trained from implicit user feedback, effectively capturing non-linear query-document relations and balancing exploration and exploitation.
Contribution
It presents the first neural OL2R approach with theoretical regret bounds and demonstrates superior performance over existing linear models.
Findings
Achieves $O( ext{log}^2 T)$ regret bound.
Outperforms state-of-the-art OL2R baselines.
Effectively balances exploration and exploitation.
Abstract
Existing online learning to rank (OL2R) solutions are limited to linear models, which are incompetent to capture possible non-linear relations between queries and documents. In this work, to unleash the power of representation learning in OL2R, we propose to directly learn a neural ranking model from users' implicit feedback (e.g., clicks) collected on the fly. We focus on RankNet and LambdaRank, due to their great empirical success and wide adoption in offline settings, and control the notorious explore-exploit trade-off based on the convergence analysis of neural networks using neural tangent kernel. Specifically, in each round of result serving, exploration is only performed on document pairs where the predicted rank order between the two documents is uncertain; otherwise, the ranker's predicted order will be followed in result ranking. We prove that under standard assumptions our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Machine Learning and Algorithms
