Towards More Robust Interpretation via Local Gradient Alignment

Sunghwan Joo; Seokhyeon Jeong; Juyeon Heo; Adrian Weller; Taesup; Moon

arXiv:2211.15900·cs.CV·December 8, 2022

Towards More Robust Interpretation via Local Gradient Alignment

Sunghwan Joo, Seokhyeon Jeong, Juyeon Heo, Adrian Weller, Taesup, Moon

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a normalization-invariant approach to improve the robustness of neural network interpretation methods, demonstrating enhanced interpretability on large-scale datasets without sacrificing accuracy.

Contribution

It proposes a combined gradient regularization method based on $ ext{l}_2$ and cosine distance criteria, addressing normalization issues in robustness of feature attribution.

Findings

01

Models trained with the proposed method yield more robust interpretations.

02

The approach is effective on large-scale datasets like ImageNet-100.

03

It maintains model accuracy while improving interpretability robustness.

Abstract

Neural network interpretation methods, particularly feature attribution methods, are known to be fragile with respect to adversarial input perturbations. To address this, several methods for enhancing the local smoothness of the gradient while training have been proposed for attaining \textit{robust} feature attributions. However, the lack of considering the normalization of the attributions, which is essential in their visualizations, has been an obstacle to understanding and improving the robustness of feature attribution methods. In this paper, we provide new insights by taking such normalization into account. First, we show that for every non-negative homogeneous neural network, a naive $ℓ_{2}$ -robust criterion for gradients is \textit{not} normalization invariant, which means that two functions with the same normalized gradient can have different values. Second, we formulate a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joshua840/robustaga
pytorchOfficial

Videos

Towards More Robust Interpretation via Local Gradient Alignment· underline

Taxonomy

TopicsCOVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning