Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary; Atharv Singh Patlan; Nils Palumbo; Ashish Hooda; Kassem Fawaz; Somesh Jha

arXiv:2605.04209·cs.CR·May 7, 2026

Undetectable Backdoors in Model Parameters: Hiding Sparse Secrets in High Dimensions

Sarthak Choudhary, Atharv Singh Patlan, Nils Palumbo, Ashish Hooda, Kassem Fawaz, Somesh Jha

PDF

TL;DR

This paper introduces Sparse Backdoor, a provably undetectable attack method that embeds sparse, structured perturbations into pre-trained classifiers, making backdoors theoretically indistinguishable from original models under standard assumptions.

Contribution

The authors propose a novel backdoor attack that is provably undetectable and formalize its indistinguishability using a connection to Sparse PCA detection under standard hardness assumptions.

Findings

01

Backdoor perturbations are masked with Gaussian dither to ensure undetectability.

02

Distinguishing the backdoored model from the original is as hard as Sparse PCA detection.

03

The attack applies to various pre-trained image classifiers, including CNNs and Vision Transformers.

Abstract

We present Sparse Backdoor, a supply-chain attack that plants a \emph{provably undetectable} backdoor in pre-trained image classifiers, including convolutional networks and Vision Transformers. The attack injects a structured sparse perturbation along a randomly chosen direction into a small subset of columns at each fully connected layer, propagating a trigger signal to an adversary-chosen target class, and masks the perturbation with an independent isotropic Gaussian dither. The dither serves a single technical purpose: it induces a clean reference distribution anchored at the pre-trained weights, against which undetectability can be formalized. Under a mild margin condition on the pre-trained classifier, we show that the dithered reference is functionally equivalent to the original classifier. We prove that distinguishing the backdoor-injected model from this reference is at least as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.