Efficient and Optimal Algorithms for Contextual Dueling Bandits under   Realizability

Aadirupa Saha; Akshay Krishnamurthy

arXiv:2111.12306·cs.LG·November 25, 2021

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

Aadirupa Saha, Akshay Krishnamurthy

PDF

Open Access

TL;DR

This paper introduces a new, efficient algorithm for the $K$-armed contextual dueling bandit problem under realizability, achieving optimal regret rates and resolving an open problem in the field.

Contribution

It presents the first computationally efficient, regret-optimal algorithm for contextual dueling bandits under realizability, with a novel notion of best response regret.

Findings

01

Achieves the optimal regret rate for the new best response regret measure.

02

Runs in polynomial time assuming access to an online square loss regression oracle.

03

Resolves an open problem on oracle-efficient, regret-optimal algorithms for the problem.

Abstract

We study the $K$ -armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one decision was better than the other. We focus on the regret minimization problem under realizability, where the feedback is generated by a pairwise preference matrix that is well-specified by a given function class $F$ . We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works. The algorithm is also computationally efficient, running in polynomial time assuming access to an online oracle for square loss regression over $F$ . This resolves an open problem of Dud\'ik et al. [2015] on oracle efficient,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems