Backdoor Attacks on Multiagent Collaborative Systems
Shuo Chen, Yue Qiu, Jie Zhang

TL;DR
This paper introduces a novel training framework for backdoor attacks in multiagent collaborative systems, enabling an adversary to efficiently trigger abnormal behaviors in others with minimal actions.
Contribution
The paper proposes a new method to train adversarial trigger policies using auxiliary rewards, effectively inducing abnormal behaviors in multiagent systems with few actions.
Findings
Adversary can trigger abnormal behaviors with minimal actions.
Auxiliary rewards effectively guide trigger policy training.
Method outperforms baseline in efficiency and effectiveness.
Abstract
Backdoor attacks on reinforcement learning implant a backdoor in a victim agent's policy. Once the victim observes the trigger signal, it will switch to the abnormal mode and fail its task. Most of the attacks assume the adversary can arbitrarily modify the victim's observations, which may not be practical. One work proposes to let one adversary agent use its actions to affect its opponent in two-agent competitive games, so that the opponent quickly fails after observing certain trigger actions. However, in multiagent collaborative systems, agents may not always be able to observe others. When and how much the adversary agent can affect others are uncertain, and we want the adversary agent to trigger others for as few times as possible. To solve this problem, we first design a novel training framework to produce auxiliary rewards that measure the extent to which the other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
