Robust Bandit Learning with Imperfect Context

Jianyi Yang; Shaolei Ren

arXiv:2102.05018·cs.LG·April 6, 2021

Robust Bandit Learning with Imperfect Context

Jianyi Yang, Shaolei Ren

PDF

Open Access 1 Video

TL;DR

This paper introduces robust algorithms for contextual bandits that operate effectively despite imperfect context information, providing theoretical guarantees and practical validation in cloud resource management scenarios.

Contribution

It proposes two novel algorithms, MaxMinUCB and MinWD, that handle imperfect context in bandit problems with proven regret and reward bounds.

Findings

01

Both algorithms achieve asymptotic optimality compared to oracle methods.

02

Theoretical bounds demonstrate robustness against context errors.

03

Empirical results validate the algorithms' effectiveness in real-world scenarios.

Abstract

A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Robust Bandit Learning with Imperfect Context· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics