Exploiting Correlation to Achieve Faster Learning Rates in Low-Rank Preference Bandits
Suprovat Ghoshal, Aadirupa Saha

TL;DR
This paper introduces the Correlated Preference Bandits problem with low-rank models, showing that exploiting structured item correlations enables faster learning of the best item through subsetwise preference feedback, with theoretical guarantees.
Contribution
The paper proposes a new Block-Rank RUM model and provides tight sample complexity bounds for learning the best item, demonstrating the advantage of subsetwise queries over pairwise preferences.
Findings
Faster learning rates are achievable with structured low-rank models.
Subsetwise queries outperform pairwise preferences in exploiting correlations.
Matching lower bounds justify the sample complexity results.
Abstract
We introduce the \emph{Correlated Preference Bandits} problem with random utility-based choice models (RUMs), where the goal is to identify the best item from a given pool of items through online subsetwise preference feedback. We investigate whether models with a simple correlation structure, e.g. low rank, can result in faster learning rates. While we show that the problem can be impossible to solve for the general `low rank' choice models, faster learning rates can be attained assuming more structured item correlations. In particular, we introduce a new class of \emph{Block-Rank} based RUM model, where the best item is shown to be -PAC learnable with only samples. This improves on the standard sample complexity bound of known for the usual learning algorithms which might not exploit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
