Reward Delay Attacks on Deep Reinforcement Learning

Anindya Sarkar; Jiarui Feng; Yevgeniy Vorobeychik; Christopher Gill,; and Ning Zhang

arXiv:2209.03540·cs.LG·September 9, 2022

Reward Delay Attacks on Deep Reinforcement Learning

Anindya Sarkar, Jiarui Feng, Yevgeniy Vorobeychik, Christopher Gill,, and Ning Zhang

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that delaying reward signals in deep reinforcement learning can effectively manipulate learned policies, highlighting vulnerabilities in current algorithms and proposing attack methods that undermine reward-based learning.

Contribution

The paper introduces novel reward delay attacks on Q-learning, showing their effectiveness and analyzing minimal mitigation strategies' insufficiency against such attacks.

Findings

01

Reward delay attacks effectively reduce rewards and manipulate policies.

02

Targeted attacks can achieve specific policy goals despite challenges.

03

Minimal mitigation strategies are insufficient to prevent reward delay attacks.

Abstract

Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anindyasarkarIITH/Reward_Delay_Attack_DRL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCardiac electrophysiology and arrhythmias · Adversarial Robustness in Machine Learning · Neural dynamics and brain function

MethodsQ-Learning