Contextual Recommendations and Low-Regret Cutting-Plane Algorithms
Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias, Pasin, Manurangsi, Renato Paes Leme, Jon Schneider

TL;DR
This paper introduces novel algorithms for contextual linear bandits with low regret, leveraging cutting-plane methods and convex geometry techniques, applicable to routing and recommendation systems.
Contribution
It presents new low-regret algorithms for contextual bandits using cutting-plane methods and convex geometry, including variants with list recommendations and nearly tight bounds.
Findings
Achieves regret $O(d\,\log T)$ and $\,\exp(O(d \log d))$
Provides algorithms with $O(d^2 \log d)$ regret and polynomial list size
Develops nearly tight algorithms for weaker feedback models
Abstract
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We wish to learn a hidden -dimensional value . Every round, we are presented with a subset of possible actions. If we choose (i.e. recommend to the user) action , we obtain utility but only learn the identity of the best action . We design algorithms for this problem which achieve regret and . To accomplish this, we design novel cutting-plane algorithms with low "regret" -- the total distance between the true point and the hyperplanes the separation oracle returns. We also consider the variant where we are allowed to provide a list of several recommendations. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
