# A Practical Upper Bound for the Worst-Case Attribution Deviations

**Authors:** Fan Wang, Adams Wai-Kin Kong

arXiv: 2303.00340 · 2023-03-02

## TL;DR

This paper introduces a theoretical upper bound for the maximum possible deviation in model attributions under adversarial perturbations, providing a quantifiable measure of attribution robustness in deep neural networks.

## Contribution

It formulates the first explicit optimization-based upper bound for attribution deviations under bounded noise, enhancing understanding of attribution attack vulnerabilities.

## Key findings

- The bounds effectively quantify attribution robustness across datasets.
- Validation on PGD and IFIA attacks confirms the bounds' accuracy.
- Over 10 million attacks demonstrate the bounds' practical relevance.

## Abstract

Model attribution is a critical component of deep neural networks (DNNs) for its interpretability to complex models. Recent studies bring up attention to the security of attribution methods as they are vulnerable to attribution attacks that generate similar images with dramatically different attributions. Existing works have been investigating empirically improving the robustness of DNNs against those attacks; however, none of them explicitly quantifies the actual deviations of attributions. In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region while the classification results remain the same. Based on the formulation, different practical approaches are introduced to bound the attributions above using Euclidean distance and cosine similarity under both $\ell_2$ and $\ell_\infty$-norm perturbations constraints. The bounds developed by our theoretical study are validated on various datasets and two different types of attacks (PGD attack and IFIA attribution attack). Over 10 million attacks in the experiments indicate that the proposed upper bounds effectively quantify the robustness of models based on the worst-case attribution dissimilarities.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2303.00340/full.md

## Figures

27 figures with captions in the complete paper: https://tomesphere.com/paper/2303.00340/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/2303.00340/full.md

---
Source: https://tomesphere.com/paper/2303.00340