Quantifying the Preferential Direction of the Model Gradient in   Adversarial Training With Projected Gradient Descent

Ricardo Bigolin Lanfredi; Joyce D. Schroeder; Tolga Tasdizen

arXiv:2009.04709·stat.ML·April 21, 2023

Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Ricardo Bigolin Lanfredi, Joyce D. Schroeder, Tolga Tasdizen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new way to measure the alignment of model gradients in adversarial training, showing that better alignment correlates with increased robustness against attacks.

Contribution

It proposes a novel definition of gradient alignment direction and a metric to evaluate it, demonstrating its effectiveness in improving adversarial robustness.

Findings

01

PGD-trained models show higher gradient alignment.

02

The proposed metric outperforms existing metrics.

03

Enforcing alignment enhances model robustness.

Abstract

Adversarial training, especially projected gradient descent (PGD), has proven to be a successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs have a preferential direction. However, the direction of alignment is not mathematically well established, making it difficult to evaluate quantitatively. We propose a novel definition of this direction as the direction of the vector pointing toward the closest point of the support of the closest inaccurate class in decision space. To evaluate the alignment with this direction after adversarial training, we apply a metric that uses generative adversarial networks to produce the smallest residual needed to change the class present in the image. We show that PGD-trained models have a higher alignment than the baseline according to our definition, that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ricbl/gradient-direction-of-robust-models
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Anomaly Detection Techniques and Applications

MethodsInterpretability