# Simple Algorithms for Dueling Bandits

**Authors:** Tyler Lekang, Andrew Lamperski

arXiv: 1906.07611 · 2019-06-19

## TL;DR

This paper introduces simple algorithms for Dueling Bandits, providing regret bounds independent of preference gaps, and demonstrates their competitive performance through theoretical analysis and experiments.

## Contribution

The paper proposes new simple algorithms for Dueling Bandits with regret bounds not depending on preference gaps, advancing the state-of-the-art.

## Key findings

- Regret bounds of order O(T^rho) with 1/2 <= rho <= 3/4
- Algorithms outperform existing methods in some synthetic experiments
- Regret performance comparable or better than state-of-the-art algorithms

## Abstract

In this paper, we present simple algorithms for Dueling Bandits. We prove that the algorithms have regret bounds for time horizon T of order O(T^rho ) with 1/2 <= rho <= 3/4, which importantly do not depend on any preference gap between actions, Delta. Dueling Bandits is an important extension of the Multi-Armed Bandit problem, in which the algorithm must select two actions at a time and only receives binary feedback for the duel outcome. This is analogous to comparisons in which the rater can only provide yes/no or better/worse type responses. We compare our simple algorithms to the current state-of-the-art for Dueling Bandits, ISS and DTS, discussing complexity and regret upper bounds, and conducting experiments on synthetic data that demonstrate their regret performance, which in some cases exceeds state-of-the-art.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.07611/full.md

## Figures

64 figures with captions in the complete paper: https://tomesphere.com/paper/1906.07611/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1906.07611/full.md

---
Source: https://tomesphere.com/paper/1906.07611