Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

Shijie Liu; Andrew C. Cullen; Paul Montague; Sarah Erfani; Benjamin I. P. Rubinstein

arXiv:2505.20621·cs.LG·May 28, 2025

Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning

Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces a multi-level certified defense mechanism for offline reinforcement learning that provides stronger guarantees against poisoning attacks, ensuring robustness across various environments and significantly improving safety.

Contribution

The work extends certified defenses using Differential Privacy to cover both per-state actions and cumulative rewards in offline RL, applicable to continuous, discrete, stochastic, and deterministic settings.

Findings

01

Performance drops limited to 50% with 7% poisoned data

02

Achieves 5 times larger certified radii than prior work

03

Significantly improves robustness and safety in offline RL

Abstract

Similar to other machine learning frameworks, Offline Reinforcement Learning (RL) is shown to be vulnerable to poisoning attacks, due to its reliance on externally sourced datasets, a vulnerability that is exacerbated by its sequential nature. To mitigate the risks posed by RL poisoning, we extend certified defenses to provide larger guarantees against adversarial manipulation, ensuring robustness for both per-state actions, and the overall expected cumulative reward. Our approach leverages properties of Differential Privacy, in a manner that allows this work to span both continuous and discrete spaces, as well as stochastic and deterministic environments -- significantly expanding the scope and applicability of achievable guarantees. Empirical evaluations demonstrate that our approach ensures the performance drops to no more than $50%$ with up to $7%$ of the training data poisoned,…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

* The paper is very well presented. As someone who is not very familiar with RL poisoning, or certified defenses to poisoning, I think the exposition and organization in the paper was great. The flow was logical and the writing, definitions, and lemmas/theorems were all clear. * The experimentation is thorough and provides reasonable empirical justification for the proposed method.

Weaknesses

* Although the proposed method does offer transition-level certified robustness, it seems like the trajectory level results presented in Table 1 do not always show an improvement from the proposed method. I'm specifically looking at the Breakout results. I do acknowledge that the proposed method also works for continuous action spaces, and has other flexibility advantages, so I do not consider the previously mentioned results to be of great concern. * The need to train multiple policies could p

Reviewer 02Rating 8Confidence 2

Strengths

1. This paper studies an important problem. The threat of poisoning attacks in the real world should not be underestimated. It is crucial to develop methods that are robust against such attacks. 2. This paper provides a framework that can make a class of RL algorithms robust, which is much more powerful than providing a single robust algorithm 3. This paper provides strong theoretical guarantees on its learning framework instead of empirical evaluation, which is especially important for develo

Weaknesses

1. This work does not investigate the lower bound of the guarantee one can get for a robust DRL algorithm under the poisoning attack. So, it is unknown if the method in this work is optimal and the gap between them. 2. The attack's power is too great and unrealistic. The learning algorithm assumes that the attacker can modify some trajectories arbitrarily without any constraint, which may not be realistic for the attack. Currently, the method expects a 50% drop in performance for robustness aga

Reviewer 03Rating 3Confidence 4

Strengths

1. The proposed framework extends beyond the previous work and can handle several important variations including continuous action space, stochastic transition, as well as a transition-level poisoning adversary. 2. The authors conducted extensive evaluations and offered concrete discussions and comparisons.

Weaknesses

1) One major issue with the current status of the draft is the lack of clarity. Important technical details are missing, e.g., the authors didn't clarify how DP is used in the framework; I assume it is through applying DP-SGD when training each sub-policy to ensure each of the obtained set of policies is DP, but this is definitely important technical detail that shouldn't be omitted. Moreover, it's confusing when the authors mentioned in line 240 "depending on whether the DP training algorithm p

Videos

Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning· slideslive

Taxonomy

TopicsPesticide Exposure and Toxicity