Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models
Viktor Bengs, Aadirupa Saha, Eyke H\"ullermeier

TL;DR
This paper introduces a new efficient algorithm, CoLSTIM, for contextual dueling bandits under linear stochastic transitivity models, achieving near-optimal regret bounds and outperforming existing methods in experiments.
Contribution
The paper proposes CoLSTIM, a computationally efficient algorithm with provable regret bounds for contextual dueling bandits under CoLST models, and demonstrates its optimality and empirical superiority.
Findings
CoLSTIM achieves regret of order √(dT) after T rounds.
CoLSTIM is shown to be optimal via a lower bound on weak regret.
Experiments show CoLSTIM outperforms state-of-the-art algorithms.
Abstract
We consider the regret minimization task in a dueling bandits problem with context information. In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be compared with each other and receives feedback in the form of noisy preference information. We assume that the feedback process is determined by a linear stochastic transitivity model with contextualized utilities (CoLST), and the learner's task is to include the best arm (with highest latent context-dependent utility) in the duel. We propose a computationally efficient algorithm, , which makes its choice based on imitating the feedback process using perturbed context-dependent utility estimates of the underlying CoLST model. If each arm is associated with a -dimensional feature vector, we show that achieves a regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Distributed Sensor Networks and Detection Algorithms · Stochastic Gradient Optimization Techniques
