Fortified Networks: Improving the Robustness of Deep Networks by   Modeling the Manifold of Hidden Representations

Alex Lamb; Jonathan Binas; Anirudh Goyal; Dmitriy Serdyuk; Sandeep; Subramanian; Ioannis Mitliagkas; Yoshua Bengio

arXiv:1804.02485·stat.ML·April 10, 2018·29 cites

Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations

Alex Lamb, Jonathan Binas, Anirudh Goyal, Dmitriy Serdyuk, Sandeep, Subramanian, Ioannis Mitliagkas, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

This paper introduces Fortified Networks, a method that enhances deep network robustness by transforming hidden layers to stay on the data manifold, improving resistance to adversarial attacks without relying on gradient masking.

Contribution

The paper presents a novel approach to improve deep network robustness by fortifying hidden layers, a strategy that outperforms input space modifications and reduces vulnerability to adversarial examples.

Findings

01

Enhanced robustness to adversarial attacks in black-box and white-box settings

02

Improvements are not primarily due to gradient masking

03

Fortifying hidden layers is more effective than input space modifications

Abstract

Deep networks have achieved impressive results across a variety of important tasks. However a known weakness is a failure to perform well when evaluated on data which differ from the training distribution, even if these differences are very small, as is the case with adversarial examples. We propose Fortified Networks, a simple transformation of existing networks, which fortifies the hidden layers in a deep network by identifying when the hidden states are off of the data manifold, and maps these hidden states back to parts of the data manifold where the network performs well. Our principal contribution is to show that fortifying these hidden states improves the robustness of deep networks and our experiments (i) demonstrate improved robustness to standard adversarial attacks in both black-box and white-box threat models; (ii) suggest that our improvements are not primarily due to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jbinas/fortified-networks
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications