Effect of backdoor attacks over the complexity of the latent space distribution
Henry D. Chacon, Paul Rad

TL;DR
This paper investigates how backdoor attacks alter the latent space distribution in neural networks, using a novel copula-based auto-encoder to detect and quantify these changes without distributional assumptions.
Contribution
It introduces the D-vine Copula Auto-Encoder (VCAE) for estimating latent space distributions under backdoor attacks, revealing dependency structure changes and entropy increases.
Findings
Entropy in latent space increases by around 27% with backdoor triggers.
Backdoor attacks induce dependency structure changes in the input space.
The proposed VCAE effectively detects distributional differences caused by backdoors.
Abstract
The input space complexity determines the model's capabilities to extract their knowledge and translate the space of attributes into a function which is assumed in general, as a concatenation of non-linear functions between layers. In the presence of backdoor attacks, the space complexity changes, and induces similarities between classes that directly affect the model's training. As a consequence, the model tends to overfit the input set. In this research, we suggest the D-vine Copula Auto-Encoder (VCAE) as a tool to estimate the latent space distribution under the presence of backdoor triggers. Since no assumptions are made on the distribution estimation, like in Variational Autoencoders (VAE). It is possible to observe the backdoor stamp in non-attacked categories randomly generated. We exhibit the differences between a clean model (baseline) and the attacked one (backdoor) in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security
