Feature Losses for Adversarial Robustness
Kirthi Shankar Sivamani

TL;DR
This paper introduces a novel defense method against adversarial attacks in deep learning by using feature-based autoencoders to denoise feature maps, improving robustness on MNIST and CIFAR10 datasets.
Contribution
The paper proposes a new defense technique employing feature map autoencoders trained on perceptual losses, which is resilient against certain adversarial attacks and can be integrated with existing CNNs.
Findings
Achieves near state-of-the-art defense performance on MNIST and CIFAR10.
Effective against simple and iterative LP attacks.
Can be used as a preprocessing step for any CNN.
Abstract
Deep learning has made tremendous advances in computer vision tasks such as image classification. However, recent studies have shown that deep learning models are vulnerable to specifically crafted adversarial inputs that are quasi-imperceptible to humans. In this work, we propose a novel approach to defending adversarial attacks. We employ an input processing technique based on denoising autoencoders as a defense. It has been shown that the input perturbations grow and accumulate as noise in feature maps while propagating through a convolutional neural network (CNN). We exploit the noisy feature maps by using an additional subnetwork to extract image feature maps and train an auto-encoder on perceptual losses of these feature maps. This technique achieves close to state-of-the-art results on defending MNIST and CIFAR10 datasets, but more importantly, shows a new way of employing a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Bacillus and Francisella bacterial research
