When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

Donald Flynn; Hadas Yaron Goldhirsh; Jonathan P. Keating; Inbar Seroussi

arXiv:2605.22481·cs.LG·May 22, 2026

When Stronger Triggers Backfire: A High-Dimensional Theory of Backdoor Attacks

Donald Flynn, Hadas Yaron Goldhirsh, Jonathan P. Keating, Inbar Seroussi

PDF

TL;DR

This paper reveals counter-intuitive effects of trigger strength in high-dimensional backdoor attacks, showing that stronger triggers can sometimes reduce attack success and improve test accuracy.

Contribution

It provides a high-dimensional theoretical analysis of backdoor attacks, deriving closed-form results and identifying phenomena not explained by classical analysis.

Findings

01

Test accuracy improves with trigger strength.

02

Attack success peaks at an optimal trigger strength.

03

Most damaging trigger aligns with data covariance eigenvector.

Abstract

Backdoor poisoning attacks behave counter-intuitively in high dimensions: stronger training triggers can help the defender. We study regularised generalised linear models on Gaussian-mixture data in the proportional regime ( $p / n \to κ$ ), varying the training trigger strength $α$ against a fixed test trigger. Three phenomena emerge: (i) clean test accuracy increases with $α$ ; (ii) attack success peaks at a finite $α$ and then declines; and (iii) the most damaging trigger direction is the minimum eigenvector of the data covariance. We prove all three results in closed form for the squared loss, and extend (i) and (ii) to general convex GLM losses via a Gaussian-proxy fixed-point system. We identify a finite-sample noise floor proportional to $κ$ as the mechanism behind (i), invisible to classical $n ≫ p$ analysis. Experiments on CIFAR-10 and Gaussian surrogates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.