Coordinated Attacks against Contextual Bandits: Fundamental Limits and   Defense Mechanisms

Jeongyeol Kwon; Yonathan Efroni; Constantine Caramanis; Shie Mannor

arXiv:2201.12700·cs.LG·February 1, 2022

Coordinated Attacks against Contextual Bandits: Fundamental Limits and Defense Mechanisms

Jeongyeol Kwon, Yonathan Efroni, Constantine Caramanis, Shie Mannor

PDF

Open Access

TL;DR

This paper investigates the fundamental limits and defenses for robust policy learning in multitask contextual bandits with a small fraction of adversarial users, revealing lower bounds and proposing efficient robust algorithms.

Contribution

It establishes lower bounds on user interactions needed in adversarial settings and proposes robust mean estimation methods to achieve near-optimal policy learning.

Findings

01

Lower bound of rac{ ilde{"}Omega}( ext{min}(S,A) imes \u03b1^2 / \u03b5^2) interactions per user.

02

An upper bound of rac{ ilde{"}O}( ext{min}(S,A) imes \u03b1 / \u03b5^2) interactions using robust mean estimators.

03

Potential improvements depending on context distribution assumptions.

Abstract

Motivated by online recommendation systems, we propose the problem of finding the optimal policy in multitask contextual bandits when a small fraction $α < 1/2$ of tasks (users) are arbitrary and adversarial. The remaining fraction of good users share the same instance of contextual bandits with $S$ contexts and $A$ actions (items). Naturally, whether a user is good or adversarial is not known in advance. The goal is to robustly learn the policy that maximizes rewards for good users with as few user interactions as possible. Without adversarial users, established results in collaborative filtering show that $O (1/ ϵ^{2})$ per-user interactions suffice to learn a good policy, precisely because information can be shared across users. This parallelization gain is fundamentally altered by the presence of adversarial users: unless there are super-polynomial number of users, we show a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Adversarial Robustness in Machine Learning