Robust Contextual Linear Bandits
Rong Zhu, Branislav Kveton

TL;DR
This paper introduces robust algorithms for contextual bandits that effectively handle model misspecification due to unobserved inter-arm heterogeneity, improving robustness and computational efficiency.
Contribution
It proposes a new setting with arm-specific random variables, developing two algorithms, RoLinUCB and RoLinTS, with theoretical regret bounds for robust contextual bandit learning.
Findings
RoLinTS is statistically efficient under low misspecification.
RoLinTS is more robust than classic methods under high misspecification.
RoLinTS is significantly more computationally efficient than naive implementations.
Abstract
Model misspecification is a major consideration in applications of statistical methods and machine learning. However, it is often neglected in contextual bandits. This paper studies a common form of misspecification, an inter-arm heterogeneity that is not captured by context. To address this issue, we assume that the heterogeneity arises due to arm-specific random variables, which can be learned. We call this setting a robust contextual bandit. The arm-specific variables explain the unknown inter-arm heterogeneity, and we incorporate them in the robust contextual estimator of the mean reward and its uncertainty. We develop two efficient bandit algorithms for our setting: a UCB algorithm called RoLinUCB and a posterior-sampling algorithm called RoLinTS. We analyze both algorithms and bound their -round Bayes regret. Our experiments show that RoLinTS is comparably statistically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms
