Verifying Neural Networks Against Backdoor Attacks
Long H. Pham, Jun Sun

TL;DR
This paper presents a verification method combining statistical sampling and abstract interpretation to determine if neural networks are free of backdoors, addressing security concerns in critical applications.
Contribution
It introduces a novel verification approach that can certify the absence of backdoors in neural networks with a certain success rate, improving over heuristic detection methods.
Findings
Effectively verifies absence of backdoors in neural networks.
Can generate backdoor triggers when verification fails.
Outperforms existing randomized smoothing methods.
Abstract
Neural networks have achieved state-of-the-art performance in solving many problems, including many applications in safety/security-critical systems. Researchers also discovered multiple security issues associated with neural networks. One of them is backdoor attacks, i.e., a neural network may be embedded with a backdoor such that a target output is almost always generated in the presence of a trigger. Existing defense approaches mostly focus on detecting whether a neural network is 'backdoored' based on heuristics, e.g., activation patterns. To the best of our knowledge, the only line of work which certifies the absence of backdoor is based on randomized smoothing, which is known to significantly reduce neural network performance. In this work, we propose an approach to verify whether a given neural network is free of backdoor with a certain level of success rate. Our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
