Unreasonable Effectiveness of Last Hidden Layer Activations for   Adversarial Robustness

Omer Faruk Tuna; Ferhat Ozgur Catak; M. Taner Eskil

arXiv:2202.07342·cs.LG·May 17, 2022

Unreasonable Effectiveness of Last Hidden Layer Activations for Adversarial Robustness

Omer Faruk Tuna, Ferhat Ozgur Catak, M. Taner Eskil

PDF

Open Access

TL;DR

This paper demonstrates that using high-temperature activation functions in the output layer of DNNs can significantly improve adversarial robustness by nullifying gradients, thus hindering gradient-based attack methods.

Contribution

The study introduces a novel approach of applying high-temperature activation functions at the output layer to enhance adversarial robustness, supported by mathematical analysis and empirical validation.

Findings

01

High-temperature activations reduce gradients, preventing gradient-based attacks.

02

The approach improves robustness on MNIST and CIFAR10 datasets.

03

Enhanced non-linearity offers additional defense against certain attacks.

Abstract

In standard Deep Neural Network (DNN) based classifiers, the general convention is to omit the activation function in the last (output) layer and directly apply the softmax function on the logits to get the probability scores of each class. In this type of architectures, the loss value of the classifier against any output class is directly proportional to the difference between the final probability score and the label value of the associated class. Standard White-box adversarial evasion attacks, whether targeted or untargeted, mainly try to exploit the gradient of the model loss function to craft adversarial samples and fool the model. In this study, we show both mathematically and experimentally that using some widely known activation functions in the output layer of the model with high temperature values has the effect of zeroing out the gradients for both targeted and untargeted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsSoftmax