Learning for Bandits under Action Erasures
Osama Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

TL;DR
This paper introduces a robust multi-armed bandit framework that handles action erasures over communication channels, providing algorithms with near-optimal regret bounds despite erasures.
Contribution
It proposes a scheme compatible with any MAB algorithm to handle action erasures and introduces a modified successive arm elimination with proven optimal regret bounds.
Findings
The scheme achieves worst-case regret within a factor of O(1/√(1-ε)) of no-erasure regret.
The modified successive arm elimination algorithm has a regret of Õ(√KT + K/(1-ε)).
A matching lower bound confirms the optimality of the proposed regret bound.
Abstract
We consider a novel multi-arm bandit (MAB) setup, where a learner needs to communicate the actions to distributed agents over erasure channels, while the rewards for the actions are directly available to the learner through external sensors. In our model, while the distributed agents know if an action is erased, the central learner does not (there is no feedback), and thus does not know whether the observed reward resulted from the desired action or not. We propose a scheme that can work on top of any (existing or future) MAB algorithm and make it robust to action erasures. Our scheme results in a worst-case regret over action-erasure channels that is at most a factor of away from the no-erasure worst-case regret of the underlying MAB algorithm, where is the erasure probability. We also propose a modification of the successive arm elimination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Adversarial Robustness in Machine Learning
