Universal Black-Box Reward Poisoning Attack against Offline Reinforcement Learning
Yinglun Xu, Rohan Gumaste, Gagandeep Singh

TL;DR
This paper introduces a universal black-box reward poisoning attack on offline reinforcement learning, capable of manipulating learned policies by altering dataset rewards without knowing the specific learning algorithm.
Contribution
It presents the first universal black-box attack method for offline RL and provides theoretical and empirical validation of its effectiveness.
Findings
The attack successfully manipulates policy performance in various datasets.
It is effective against multiple state-of-the-art offline RL algorithms.
The method operates within limited perturbation budgets.
Abstract
We study the problem of universal black-boxed reward poisoning attacks against general offline reinforcement learning with deep neural networks. We consider a black-box threat model where the attacker is entirely oblivious to the learning algorithm, and its budget is limited by constraining the amount of corruption at each data point and the total perturbation. We require the attack to be universally efficient against any efficient algorithms that might be used by the agent. We propose an attack strategy called the `policy contrast attack.' The idea is to find low- and high-performing policies covered by the dataset and make them appear to be high- and low-performing to the agent, respectively. To the best of our knowledge, we propose the first universal black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Forensic Toxicology and Drug Analysis
