Biased Dueling Bandits with Stochastic Delayed Feedback
Bongsoo Yi, Yue Kang, Yao Li

TL;DR
This paper studies the dueling bandit problem with stochastic delays and preference bias, proposing two algorithms with regret bounds and empirical validation for realistic delayed feedback scenarios.
Contribution
It introduces the biased dueling bandit problem with stochastic delays and develops two algorithms for different delay information settings, with theoretical regret analysis.
Findings
Algorithms achieve optimal regret bounds in their settings.
Empirical results validate the effectiveness of the proposed algorithms.
The study addresses realistic delayed feedback in dueling bandits.
Abstract
The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information retrieval, and more. However, in many real-world applications, the feedback for actions is often subject to unavoidable delays and is not immediately available to the agent. This partially observable issue poses a significant challenge to existing dueling bandit literature, as it significantly affects how quickly and accurately the agent can update their policy on the fly. In this paper, we introduce and examine the biased dueling bandit problem with stochastic delayed feedback, revealing that this new practical problem will delve into a more realistic and intriguing scenario involving a preference bias between the selections. We present two algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Distributed Sensor Networks and Detection Algorithms
