Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions
Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo, Ashish Hooda, Kassem Fawaz, Somesh Jha

TL;DR
This paper introduces Sparse Backdoor, a provably undetectable attack method that embeds sparse, structured perturbations into pre-trained classifiers, making backdoors theoretically indistinguishable from original models under standard assumptions.
Contribution
The authors propose a novel backdoor attack that is provably undetectable and formalize its indistinguishability using a connection to Sparse PCA detection under standard hardness assumptions.
Findings
Backdoor perturbations are masked with Gaussian dither to ensure undetectability.
Distinguishing the backdoored model from the original is as hard as Sparse PCA detection.
The attack applies to various pre-trained image classifiers, including CNNs and Vision Transformers.
Abstract
We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including convolutional networks and Vision Transformers. The attack injects a structured sparse perturbation along a randomly chosen direction into a small subset of columns at each fully connected layer, propagating a trigger signal to an adversary-chosen target class, and masks the perturbation with an independent isotropic Gaussian dither. The dither serves a single technical purpose: it induces a clean reference distribution anchored at the pre-trained weights, against which undetectability can be formalized. Under a mild margin condition on the pre-trained classifier, we show that the dithered reference is functionally equivalent to the original classifier. We prove that distinguishing the backdoor-injected model from this reference is at least as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
