TL;DR
This paper critically evaluates the robustness of neural networks against adversarial attacks, demonstrating that existing defenses like defensive distillation are ineffective and proposing new, more successful attack algorithms.
Contribution
The paper introduces three new attack algorithms that break defensive distillation and proposes a transferability test to evaluate neural network robustness.
Findings
Defensive distillation does not significantly improve robustness.
New attacks achieve 100% success on distilled and undistilled networks.
Transferability test can effectively break defensive distillation.
Abstract
Neural networks provide state-of-the-art results for most machine learning tasks. Unfortunately, neural networks are vulnerable to adversarial examples: given an input and any target classification , it is possible to find a new input that is similar to but classified as . This makes it difficult to apply neural networks in security-critical areas. Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from to . In this paper, we demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with probability. Our attacks are tailored to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Language Models are "Modelling The World" [Nicholas Carlini]· youtube
