Robust Bandit Learning with Imperfect Context
Jianyi Yang, Shaolei Ren

TL;DR
This paper introduces robust algorithms for contextual bandits that operate effectively despite imperfect context information, providing theoretical guarantees and practical validation in cloud resource management scenarios.
Contribution
It proposes two novel algorithms, MaxMinUCB and MinWD, that handle imperfect context in bandit problems with proven regret and reward bounds.
Findings
Both algorithms achieve asymptotic optimality compared to oracle methods.
Theoretical bounds demonstrate robustness against context errors.
Empirical results validate the algorithms' effectiveness in real-world scenarios.
Abstract
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics
