Learning to Attack: A Bandit Approach to Adversarial Context Poisoning
Ray Telikani, Amir H. Gandomi

TL;DR
This paper presents AdvBandit, a black-box attack method that uses a bandit framework to effectively poison neural contextual bandits, significantly increasing their regret without internal access.
Contribution
It introduces a novel bandit-based attack approach for adversarial context poisoning that requires no internal model access and provides theoretical performance guarantees.
Findings
AdvBandit outperforms existing attack methods in experiments.
The attack achieves higher victim regret on real-world datasets.
Theoretical analysis confirms sublinear attacker regret.
Abstract
Neural contextual bandits are vulnerable to adversarial attacks, where subtle perturbations to rewards, actions, or contexts induce suboptimal decisions. We introduce AdvBandit, a black-box adaptive attack that formulates context poisoning as a continuous-armed bandit problem, enabling the attacker to jointly learn and exploit the victim's evolving policy. The attacker requires no access to the victim's internal parameters, reward function, or gradient information; instead, it constructs a surrogate model using a maximum-entropy inverse reinforcement learning module from observed context-action pairs and optimizes perturbations against this surrogate using projected gradient descent. An upper confidence bound-aware Gaussian process guides arm selection. An attack-budget control mechanism is also introduced to limit detection risk and overhead. We provide theoretical guarantees,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
