Bias-Robust Bayesian Optimization via Dueling Bandits
Johannes Kirschner, Andreas Krause

TL;DR
This paper introduces a new kernelized dueling bandit algorithm based on information-directed sampling, designed to handle adversarial biases in Bayesian optimization, with theoretical regret guarantees and extensions to non-linear rewards.
Contribution
It reduces confounded Bayesian optimization to dueling bandits and proposes the first efficient kernelized dueling bandit algorithm with regret guarantees.
Findings
First kernelized dueling bandit algorithm with regret bounds
Extension to non-linear reward functions
Links to doubly-robust estimation
Abstract
We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model. Then we propose a novel approach for dueling bandits based on information-directed sampling (IDS). Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees. Our analysis further generalizes a previously proposed semi-parametric linear bandit model to non-linear reward functions, and uncovers interesting links to doubly-robust estimation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics
