Online Ranking with Top-1 Feedback
Sougata Chaudhuri, Ambuj Tewari

TL;DR
This paper studies online ranking with limited feedback, establishing regret bounds for various ranking measures and showing some normalized measures are not learnable in this setting.
Contribution
It introduces a novel top-1 feedback model for online ranking and provides tight regret bounds for several ranking measures, including efficient algorithms for some.
Findings
Minimax regret for PairwiseLoss and DCG is Θ(T^{2/3}).
Efficient strategies achieve O(T^{2/3}) regret for these measures.
Normalized measures like AUC, NDCG, and MAP cannot be learned with sublinear regret.
Abstract
We consider a setting where a system learns to rank a fixed set of items. The goal is produce good item rankings for users with diverse interests who interact online with the system for rounds. We consider a novel top- feedback model: at the end of each round, the relevance score for only the top ranked object is revealed. However, the performance of the system is judged on the entire ranked list. We provide a comprehensive set of results regarding learnability under this challenging setting. For PairwiseLoss and DCG, two popular ranking measures, we prove that the minimax regret is . Moreover, the minimax regret is achievable using an efficient strategy that only spends time per round. The same efficient strategy achieves regret for Precision@. Surprisingly, we show that for normalized versions of these ranking measures, i.e.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
