Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms
Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

TL;DR
This paper investigates the fundamental limits and defenses for robust policy learning in multitask contextual bandits with a small fraction of adversarial users, revealing lower bounds and proposing efficient robust algorithms.
Contribution
It establishes lower bounds on user interactions needed in adversarial settings and proposes robust mean estimation methods to achieve near-optimal policy learning.
Findings
Lower bound of rac{ ilde{"}Omega}( ext{min}(S,A) imes \u03b1^2 / \u03b5^2) interactions per user.
An upper bound of rac{ ilde{"}O}( ext{min}(S,A) imes \u03b1 / \u03b5^2) interactions using robust mean estimators.
Potential improvements depending on context distribution assumptions.
Abstract
Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction of tasks (users) are arbitrary and adversarial. The remaining fraction of good users share the same instance of contextual bandits with contexts and actions (items). Naturally, whether a user is good or adversarial is not known in advance. The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible. Without adversarial users, established results in collaborative filtering show that per-user interactions suffice to learn a good policy, precisely because information can be shared across users. This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Adversarial Robustness in Machine Learning
