Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein

TL;DR
This paper introduces a multi-level certified defense mechanism for offline reinforcement learning that provides stronger guarantees against poisoning attacks, ensuring robustness across various environments and significantly improving safety.
Contribution
The work extends certified defenses using Differential Privacy to cover both per-state actions and cumulative rewards in offline RL, applicable to continuous, discrete, stochastic, and deterministic settings.
Findings
Performance drops limited to 50% with 7% poisoned data
Achieves 5 times larger certified radii than prior work
Significantly improves robustness and safety in offline RL
Abstract
Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments -- significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than with up to of the training data poisoned,…
Peer Reviews
Decision·ICLR 2025 Poster
* The paper is very well presented. As someone who is not very familiar with RL poisoning, or certified defenses to poisoning, I think the exposition and organization in the paper was great. The flow was logical and the writing, definitions, and lemmas/theorems were all clear. * The experimentation is thorough and provides reasonable empirical justification for the proposed method.
* Although the proposed method does offer transition-level certified robustness, it seems like the trajectory level results presented in Table 1 do not always show an improvement from the proposed method. I'm specifically looking at the Breakout results. I do acknowledge that the proposed method also works for continuous action spaces, and has other flexibility advantages, so I do not consider the previously mentioned results to be of great concern. * The need to train multiple policies could p
1. This paper studies an important problem. The threat of poisoning attacks in the real world should not be underestimated. It is crucial to develop methods that are robust against such attacks. 2. This paper provides a framework that can make a class of RL algorithms robust, which is much more powerful than providing a single robust algorithm 3. This paper provides strong theoretical guarantees on its learning framework instead of empirical evaluation, which is especially important for develo
1. This work does not investigate the lower bound of the guarantee one can get for a robust DRL algorithm under the poisoning attack. So, it is unknown if the method in this work is optimal and the gap between them. 2. The attack's power is too great and unrealistic. The learning algorithm assumes that the attacker can modify some trajectories arbitrarily without any constraint, which may not be realistic for the attack. Currently, the method expects a 50% drop in performance for robustness aga
1. The proposed framework extends beyond the previous work and can handle several important variations including continuous action space, stochastic transition, as well as a transition-level poisoning adversary. 2. The authors conducted extensive evaluations and offered concrete discussions and comparisons.
1) One major issue with the current status of the draft is the lack of clarity. Important technical details are missing, e.g., the authors didn't clarify how DP is used in the framework; I assume it is through applying DP-SGD when training each sub-policy to ensure each of the obtained set of policies is DP, but this is definitely important technical detail that shouldn't be omitted. Moreover, it's confusing when the authors mentioned in line 240 "depending on whether the DP training algorithm p
Videos
Taxonomy
TopicsPesticide Exposure and Toxicity
