When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks
Donald Flynn, Hadas Yaron Goldhirsh, Jonathan P. Keating, Inbar Seroussi

TL;DR
This paper reveals counter-intuitive effects of trigger strength in high-dimensional backdoor attacks, showing that stronger triggers can sometimes reduce attack success and improve test accuracy.
Contribution
It provides a high-dimensional theoretical analysis of backdoor attacks, deriving closed-form results and identifying phenomena not explained by classical analysis.
Findings
Test accuracy improves with trigger strength.
Attack success peaks at an optimal trigger strength.
Most damaging trigger aligns with data covariance eigenvector.
Abstract
Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime (), varying the training trigger strength against a fixed test trigger. Three phenomena emerge: (i) clean test accuracy increases with ; (ii) attack success peaks at a finite and then declines; and (iii) the most damaging trigger direction is the minimum eigenvector of the data covariance. We prove all three results in closed form for the squared loss, and extend (i) and (ii) to general convex GLM losses via a Gaussian-proxy fixed-point system. We identify a finite-sample noise floor proportional to as the mechanism behind (i), invisible to classical analysis. Experiments on CIFAR-10 and Gaussian surrogates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
