Blackbox Attacks on Reinforcement Learning Agents Using Approximated   Temporal Information

Yiren Zhao; Ilia Shumailov; Han Cui; Xitong Gao; Robert Mullins; Ross; Anderson

arXiv:1909.02918·cs.LG·November 25, 2019

Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Yiren Zhao, Ilia Shumailov, Han Cui, Xitong Gao, Robert Mullins, Ross, Anderson

PDF

TL;DR

This paper demonstrates that black-box adversarial attacks on reinforcement learning agents can be effectively performed using sequence-to-sequence models to predict actions, revealing new attack methods and highlighting methodological issues in prior evaluations.

Contribution

The work introduces a high-accuracy approximation model for black-box RL attacks and proposes a novel delayed misbehavior attack method, expanding the understanding of RL vulnerabilities.

Findings

01

Sequence-to-sequence models accurately predict RL agent actions in black-box settings.

02

Adversarial samples transfer between models but offer limited advantage over random noise.

03

Delayed misbehavior attacks can be triggered using adversarial samples, acting as time bombs.

Abstract

Recent research on reinforcement learning (RL) has suggested that trained agents are vulnerable to maliciously crafted adversarial samples. In this work, we show how such samples can be generalised from White-box and Grey-box attacks to a strong Black-box case, where the attacker has no knowledge of the agents, their training parameters and their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. First, we show our approximation model, based on time-series information from the agent, consistently predicts RL agents' future actions with high accuracy in a Black-box setup on a wide range of games and RL algorithms. Second, we find that although adversarial samples are transferable from the target model to our RL agents, they often outperform random Gaussian noise only marginally. This highlights a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.