Don't Watch Me: A Spatio-Temporal Trojan Attack on Deep-Reinforcement-Learning-Augment Autonomous Driving
Yinbo Yu, Jiajia Liu

TL;DR
This paper demonstrates a novel spatio-temporal Trojan attack on deep reinforcement learning-based autonomous driving systems, revealing vulnerabilities in traffic feature modeling that can be exploited stealthily and effectively.
Contribution
It introduces a new spatio-temporal Trojan attack method on DRL policies for autonomous driving, highlighting security risks in traffic feature-based decision systems.
Findings
Trojan attack achieves over 98.5% success rate
Attack remains effective against advanced defenses
Spatio-temporal features improve DRL performance but increase vulnerability
Abstract
Deep reinforcement learning (DRL) is one of the most popular algorithms to realize an autonomous driving (AD) system. The key success factor of DRL is that it embraces the perception capability of deep neural networks which, however, have been proven vulnerable to Trojan attacks. Trojan attacks have been widely explored in supervised learning (SL) tasks (e.g., image classification), but rarely in sequential decision-making tasks solved by DRL. Hence, in this paper, we explore Trojan attacks on DRL for AD tasks. First, we propose a spatio-temporal DRL algorithm based on the recurrent neural network and attention mechanism to prove that capturing spatio-temporal traffic features is the key factor to the effectiveness and safety of a DRL-augment AD system. We then design a spatial-temporal Trojan attack on DRL policies, where the trigger is hidden in a sequence of spatial and temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
