Few-shot Backdoor Attacks via Neural Tangent Kernels
Jonathan Hayase, Sewoong Oh

TL;DR
This paper introduces a novel method for designing effective backdoor attacks on neural networks using neural tangent kernels, achieving high success rates with fewer poisoned examples and revealing vulnerabilities in overparameterized models.
Contribution
It proposes a bilevel optimization framework leveraging neural tangent kernels to craft potent backdoor poison examples, improving attack efficiency and understanding neural network vulnerabilities.
Findings
Achieves 90% attack success with ten times fewer poison examples.
Demonstrates vulnerability in overparameterized neural networks.
Provides kernel-based interpretation of attack mechanisms.
Abstract
In a backdoor attack, an attacker injects corrupted examples into the training set. The goal of the attacker is to cause the final trained model to predict the attacker's desired target label when a predefined trigger is added to test inputs. Central to these attacks is the trade-off between the success rate of the attack and the number of corrupted training examples injected. We pose this attack as a novel bilevel optimization problem: construct strong poison examples that maximize the attack success rate of the trained model. We use neural tangent kernels to approximate the training dynamics of the model being attacked and automatically learn strong poison examples. We experiment on subclasses of CIFAR-10 and ImageNet with WideResNet-34 and ConvNeXt architectures on periodic and patch trigger attacks and show that NTBA-designed poisoned examples achieve, for example, an attack success…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsConvNeXt · Test
