Distillation as a Defense to Adversarial Perturbations against Deep   Neural Networks

Nicolas Papernot; Patrick McDaniel; Xi Wu; Somesh Jha and; Ananthram Swami

arXiv:1511.04508·cs.CR·March 15, 2016

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha and, Ananthram Swami

PDF

2 Repos

TL;DR

This paper proposes defensive distillation, a technique to significantly reduce the success of adversarial attacks on deep neural networks by decreasing gradient magnitudes and increasing the difficulty of crafting adversarial samples.

Contribution

It introduces defensive distillation as a novel method to improve DNN robustness against adversarial perturbations and provides both theoretical analysis and empirical validation.

Findings

01

Reduces adversarial sample creation success rate from 95% to less than 0.5%.

02

Decreases gradients used in adversarial crafting by a factor of 10^30.

03

Increases the minimum number of features to modify for adversarial samples by 800%.

Abstract

Deep learning algorithms have been shown to perform extremely well on many classical machine learning problems. However, recent studies have shown that deep learning, like other machine learning techniques, is vulnerable to adversarial samples: inputs crafted to force a deep neural network (DNN) to provide adversary-selected outputs. Such attacks can seriously undermine the security of the system supported by the DNN, sometimes with devastating consequences. For example, autonomous vehicles can be crashed, illicit or illegal content can bypass content filters, or biometric authentication systems can be manipulated to allow improper access. In this work, we introduce a defensive mechanism called defensive distillation to reduce the effectiveness of adversarial samples on DNNs. We analytically investigate the generalizability and robustness properties granted by the use of defensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.