Batched Dueling Bandits
Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan

TL;DR
This paper introduces algorithms for batched dueling bandits that efficiently balance the number of parallel comparison batches with regret, matching sequential bounds with fewer batches, and validates these results through experiments.
Contribution
It provides the first regret bounds for batched dueling bandits under standard assumptions, achieving near-optimal trade-offs between batch number and regret.
Findings
Algorithms achieve near-sequential regret bounds with logarithmic batches.
Theoretical bounds are nearly tight, supported by lower bounds.
Experimental validation confirms practical effectiveness.
Abstract
The -armed dueling bandit problem, where the feedback is in the form of noisy pairwise comparisons, has been widely studied. Previous works have only focused on the sequential setting where the policy adapts after every comparison. However, in many applications such as search ranking and recommendation systems, it is preferable to perform comparisons in a limited number of parallel batches. We study the batched -armed dueling bandit problem under two standard settings: (i) existence of a Condorcet winner, and (ii) strong stochastic transitivity and stochastic triangle inequality. For both settings, we obtain algorithms with a smooth trade-off between the number of batches and regret. Our regret bounds match the best known sequential regret bounds (up to poly-logarithmic factors), using only a logarithmic number of batches. We complement our regret analysis with a nearly-matching…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Optimization and Search Problems
