TL;DR
This paper proposes defensive distillation, a technique to significantly reduce the success of adversarial attacks on deep neural networks by decreasing gradient magnitudes and increasing the difficulty of crafting adversarial samples.
Contribution
It introduces defensive distillation as a novel method to improve DNN robustness against adversarial perturbations and provides both theoretical analysis and empirical validation.
Findings
Reduces adversarial sample creation success rate from 95% to less than 0.5%.
Decreases gradients used in adversarial crafting by a factor of 10^30.
Increases the minimum number of features to modify for adversarial samples by 800%.
Abstract
Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
