Optimizing Ranking Systems Online as Bandits
Chang Li

TL;DR
This paper addresses the challenges of optimizing online ranking systems using bandit algorithms, proposing novel methods to improve effectiveness, safety, adaptability to user preference changes, and result diversification.
Contribution
It introduces new algorithms for online ranking optimization tackling effectiveness, safety, nonstationarity, and diversification, advancing the application of bandit frameworks in real-world systems.
Findings
MergeDTS effectively evaluates online rankers.
BubbleRank ensures safe content display.
CascadeDUCB and CascadeSWUCB handle nonstationary user preferences.
Abstract
Ranking system is the core part of modern retrieval and recommender systems, where the goal is to rank candidate items given user contexts. Optimizing ranking systems online means that the deployed system can serve user requests, e.g., queries in the web search, and optimize the ranking policy by learning from user interactions, e.g., clicks. Bandit is a general online learning framework and can be used in our optimization task. However, due to the unique features of ranking, there are several challenges in designing bandit algorithms for ranking system optimization. In this dissertation, we study and propose solutions for four challenges in optimizing ranking systems online: effectiveness, safety, nonstationarity, and diversification. First, the effectiveness is related to how fast the algorithm learns from interactions. We study the effective online ranker evaluation task and propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Scheduling and Timetabling Solutions
