Characterizing Model Robustness via Natural Input Gradients

Adri\'an Rodr\'iguez-Mu\~noz; Tongzhou Wang; Antonio Torralba

arXiv:2409.20139·cs.LG·October 1, 2024

Characterizing Model Robustness via Natural Input Gradients

Adri\'an Rodr\'iguez-Mu\~noz, Tongzhou Wang, Antonio Torralba

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that regularizing the input gradients on natural examples, especially in vision transformers with smooth activations, can achieve robustness comparable to adversarial training with less computational cost.

Contribution

It reveals the effectiveness of input gradient regularization on natural data, especially for smooth activation models, challenging prior beliefs about its limitations.

Findings

01

Gradient Norm regularization performs well on vision transformers with smooth activations.

02

Achieves over 90% of adversarial training accuracy on ImageNet-1k.

03

Regularizing gradients on image edges improves robustness without explicit gradient norm constraints.

Abstract

Adversarially robust models are locally smooth around each data sample so that small perturbations cannot drastically change model outputs. In modern systems, such smoothness is usually obtained via Adversarial Training, which explicitly enforces models to perform well on perturbed examples. In this work, we show the surprising effectiveness of instead regularizing the gradient with respect to model inputs on natural examples only. Penalizing input Gradient Norm is commonly believed to be a much inferior approach. Our analyses identify that the performance of Gradient Norm regularization critically depends on the smoothness of activation functions, and are in fact extremely effective on modern vision transformers that adopt smooth activations over piecewise linear ones (eg, ReLU), contrary to prior belief. On ImageNet-1k, Gradient Norm training achieves > 90% the performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adriarm/robustness_input_gradients
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Fault Detection and Control Systems · Neural Networks and Applications