Double Doubly Robust Thompson Sampling for Generalized Linear Contextual   Bandits

Wonyoung Kim; Kyungbok Lee; Myunghee Cho Paik

arXiv:2209.06983·stat.ML·March 2, 2023·1 cites

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits

Wonyoung Kim, Kyungbok Lee, Myunghee Cho Paik

PDF

Open Access 1 Video

TL;DR

This paper introduces a new algorithm called Double Doubly Robust Thompson Sampling for generalized linear bandits, achieving improved regret bounds and utilizing a novel estimator that considers all contexts, with empirical validation.

Contribution

The paper presents the DDR estimator and a new regret bound for GLM bandits that does not rely on discarding rewards, advancing theoretical understanding and practical performance.

Findings

01

Achieves a regret bound of (\u221AF T) for GLM bandits.

02

First regret bound under the margin condition for GLMs with different contexts for all arms.

03

Empirical results demonstrate the effectiveness of the proposed algorithm.

Abstract

We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O} (κ^{- 1} ϕT)$ regret over $T$ rounds where $ϕ$ is the minimum eigenvalue of the covariance of contexts and $κ$ is a lower bound of the variance of rewards. In several practical cases where $ϕ = O (d)$ , our result is the first regret bound for generalized linear model (GLM) bandits with the order $d$ without relying on the approach of Auer [2002]. We achieve this bound using a novel estimator called double doubly-robust (DDR) estimator, a subclass of doubly-robust (DR) estimator but with a tighter error bound. The approach of Auer [2002] achieves independence by discarding the observed rewards, whereas our algorithm achieves independence considering all contexts using our DDR estimator. We also provide an $O (κ^{- 1} ϕ lo g (N T) lo g T)$ regret bound for $N$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Smart Grid Energy Management