RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du, Ying Wen, Yaodong Yang

TL;DR
This paper introduces RAT, a novel adversarial attack method on deep reinforcement learning agents that precisely manipulates targeted behaviors, surpassing existing methods in effectiveness and robustness across robotic and MuJoCo tasks.
Contribution
RAT is the first method to enable universal, targeted behavior attacks on DRL agents by aligning an intention policy with human preferences and dynamically adjusting state occupancy measures.
Findings
RAT outperforms existing attack algorithms in robotic simulations.
RAT effectively guides agents to adopt human-aligned behaviors in MuJoCo tasks.
RAT enhances the robustness of DRL agents against targeted behavior manipulations.
Abstract
Evaluating deep reinforcement learning (DRL) agents against targeted behavior attacks is critical for assessing their robustness. These attacks aim to manipulate the victim into specific behaviors that align with the attacker's objectives, often bypassing traditional reward-based defenses. Prior methods have primarily focused on reducing cumulative rewards; however, rewards are typically too generic to capture complex safety requirements effectively. As a result, focusing solely on reward reduction can lead to suboptimal attack strategies, particularly in safety-critical scenarios where more precise behavior manipulation is needed. To address these challenges, we propose RAT, a method designed for universal, targeted behavior attacks. RAT trains an intention policy that is explicitly aligned with human preferences, serving as a precise behavioral target for the adversary. Concurrently,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsAttention Is All You Need · Linear Layer · ADaptive gradient method with the OPTimal convergence rate · Adam · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Byte Pair Encoding
