Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That Backfire
Siddhartha Datta, Nigel Shadbolt

TL;DR
This paper investigates multi-agent backdoor attacks in collaborative learning, revealing a consistent backfire effect where multiple attackers often fail collectively, prompting a need to rethink defense strategies.
Contribution
It uncovers the backfire phenomenon in multi-agent backdoor attacks and analyzes various attack configurations to establish a lower bound on attack success rates.
Findings
Agents suffer from low collective attack success rate
Backfire phenomenon is consistent across different attack setups
Results suggest re-evaluating backdoor defenses in practical environments
Abstract
Malicious agents in collaborative learning and outsourced data collection threaten the training of clean models. Backdoor attacks, where an attacker poisons a model during training to successfully achieve targeted misclassification, are a major concern to train-time robustness. In this paper, we investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. We examine different modes of backdoor attack configurations, non-cooperation / cooperation, joint distribution shifts, and game setups to return an equilibrium attack success rate at the lower bound. The results motivate the re-evaluation of backdoor defense research for practical environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Crime, Illicit Activities, and Governance · Mathematical and Theoretical Epidemiology and Ecology Models
