Stochastic Contextual Dueling Bandits under Linear Stochastic   Transitivity Models

Viktor Bengs; Aadirupa Saha; Eyke H\"ullermeier

arXiv:2202.04593·cs.LG·October 14, 2022

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

Viktor Bengs, Aadirupa Saha, Eyke H\"ullermeier

PDF

Open Access

TL;DR

This paper introduces a new efficient algorithm, CoLSTIM, for contextual dueling bandits under linear stochastic transitivity models, achieving near-optimal regret bounds and outperforming existing methods in experiments.

Contribution

The paper proposes CoLSTIM, a computationally efficient algorithm with provable regret bounds for contextual dueling bandits under CoLST models, and demonstrates its optimality and empirical superiority.

Findings

01

CoLSTIM achieves regret of order √(dT) after T rounds.

02

CoLSTIM is shown to be optimal via a lower bound on weak regret.

03

Experiments show CoLSTIM outperforms state-of-the-art algorithms.

Abstract

We consider the regret minimization task in a dueling bandits problem with context information. In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information. We assume that the feedback process is determined by a linear stochastic transitivity model with contextualized utilities (CoLST), and the learner's task is to include the best arm (with highest latent context-dependent utility) in the duel. We propose a computationally efficient algorithm, $CoLSTIM$ , which makes its choice based on imitating the feedback process using perturbed context-dependent utility estimates of the underlying CoLST model. If each arm is associated with a $d$ -dimensional feature vector, we show that $CoLSTIM$ achieves a regret…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Stochastic Gradient Optimization Techniques