Adaptively Learning to Select-Rank in Online Platforms
Jingyuan Wang, Perry Dong, Ying Jin, Ruohan Zhan, Zhengyuan Zhou

TL;DR
This paper introduces an adaptive ranking algorithm for online platforms that personalizes item orderings using a contextual bandits approach, optimizing user satisfaction efficiently in large candidate pools.
Contribution
It develops a novel bandit-based ranking method that accounts for diverse user preferences and position effects, with theoretical regret bounds and practical efficiency.
Findings
Achieves a regret bound of O(d√NKT), improving scalability.
Outperforms baseline algorithms in simulated and real-world tests.
Effectively handles large action spaces with exponential growth.
Abstract
Ranking algorithms are fundamental to various online platforms across e-commerce sites to content streaming services. Our research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users, a key component in personalizing user experience. We develop a user response model that considers diverse user preferences and the varying effects of item positions, aiming to optimize overall user satisfaction with the ranked list. We frame this problem within a contextual bandits framework, with each ranked list as an action. Our approach incorporates an upper confidence bound to adjust predicted user satisfaction scores and selects the ranking action that maximizes these adjusted scores, efficiently solved via maximum weight imperfect matching. We demonstrate that our algorithm achieves a cumulative regret bound of for ranking out of …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Online and Blended Learning · Mobile Learning in Education
