CONQUER: Confusion Queried Online Bandit Learning
Daniel Barsky, Koby Crammer

TL;DR
This paper introduces CONQUER, a new online bandit learning algorithm for recommendation systems that select two items based on context, using a second-order framework with confidence bounds, and demonstrates its effectiveness through theoretical analysis and experiments.
Contribution
It proposes a novel second-order algorithm framework for dual-item recommendation with a regret bound analysis and empirical validation across multiple domains.
Findings
UCB-based algorithms are less effective than greedy or sampling methods.
The proposed algorithm achieves a regret bound of O(Q_T + sqrt(TQ_T log T) + sqrt(T) log T).
Experimental results show advantages over related algorithms across 33 domains.
Abstract
We present a new recommendation setting for picking out two items from a given set to be highlighted to a user, based on contextual input. These two items are presented to a user who chooses one of them, possibly stochastically, with a bias that favours the item with the higher value. We propose a second-order algorithm framework that members of it use uses relative upper-confidence bounds to trade off exploration and exploitation, and some explore via sampling. We analyze one algorithm in this framework in an adversarial setting with only mild assumption on the data, and prove a regret bound of , where is the number of rounds and is the cumulative approximation error of item values using a linear model. Experiments with product reviews from 33 domains show the advantage of our methods over algorithms designed for related settings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
