RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted   Behaviors

Fengshuo Bai; Runze Liu; Yali Du; Ying Wen; Yaodong Yang

arXiv:2412.10713·cs.LG·December 17, 2024

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors

Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces RAT, a novel adversarial attack method on deep reinforcement learning agents that precisely manipulates targeted behaviors, surpassing existing methods in effectiveness and robustness across robotic and MuJoCo tasks.

Contribution

RAT is the first method to enable universal, targeted behavior attacks on DRL agents by aligning an intention policy with human preferences and dynamically adjusting state occupancy measures.

Findings

01

RAT outperforms existing attack algorithms in robotic simulations.

02

RAT effectively guides agents to adopt human-aligned behaviors in MuJoCo tasks.

03

RAT enhances the robustness of DRL agents against targeted behavior manipulations.

Abstract

Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These attacks aim to manipulate the victim into specific behaviors that align with the attacker's objectives, often bypassing traditional reward-based defenses. Prior methods have primarily focused on reducing cumulative rewards; however, rewards are typically too generic to capture complex safety requirements effectively. As a result, focusing solely on reward reduction can lead to suboptimal attack strategies, particularly in safety-critical scenarios where more precise behavior manipulation is needed. To address these challenges, we propose RAT, a method designed for universal, targeted behavior attacks. RAT trains an intention policy that is explicitly aligned with human preferences, serving as a precise behavioral target for the adversary. Concurrently,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsAttention Is All You Need · Linear Layer · ADaptive gradient method with the OPTimal convergence rate · Adam · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Byte Pair Encoding