# BlurNet: Defense by Filtering the Feature Maps

**Authors:** Ravi Raju, Mikko Lipasti

arXiv: 1908.02256 · 2020-05-19

## TL;DR

BlurNet defends neural networks against adversarial attacks by filtering high-frequency noise in feature maps, significantly reducing attack success rates through a novel low-pass filtering approach integrated into training.

## Contribution

The paper introduces BlurNet, a novel defense mechanism that applies low-pass filtering to feature maps, improving robustness against specific adversarial attacks like $RP_2$.

## Key findings

- High frequency noise is introduced by $RP_2$ attack.
- Low-pass filtering reduces attack success rate from 90% to 20%.
- Regularization schemes enhance defense effectiveness.

## Abstract

Recently, the field of adversarial machine learning has been garnering attention by showing that state-of-the-art deep neural networks are vulnerable to adversarial examples, stemming from small perturbations being added to the input image. Adversarial examples are generated by a malicious adversary by obtaining access to the model parameters, such as gradient information, to alter the input or by attacking a substitute model and transferring those malicious examples over to attack the victim model. Specifically, one of these attack algorithms, Robust Physical Perturbations ($RP_2$), generates adversarial images of stop signs with black and white stickers to achieve high targeted misclassification rates against standard-architecture traffic sign classifiers. In this paper, we propose BlurNet, a defense against the $RP_2$ attack. First, we motivate the defense with a frequency analysis of the first layer feature maps of the network on the LISA dataset, which shows that high frequency noise is introduced into the input image by the $RP_2$ algorithm. To remove the high frequency noise, we introduce a depthwise convolution layer of standard blur kernels after the first layer. We perform a blackbox transfer attack to show that low-pass filtering the feature maps is more beneficial than filtering the input. We then present various regularization schemes to incorporate this low-pass filtering behavior into the training regime of the network and perform white-box attacks. We conclude with an adaptive attack evaluation to show that the success rate of the attack drops from 90\% to 20\% with total variation regularization, one of the proposed defenses.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.02256/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1908.02256/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1908.02256/full.md

---
Source: https://tomesphere.com/paper/1908.02256