FAR: A General Framework for Attributional Robustness
Adam Ivankay, Ivan Girardi, Chiara Marchiori, Pascal Frossard

TL;DR
This paper introduces FAR, a flexible framework for training neural networks with attributional robustness, improving the stability of attribution maps against adversarial perturbations in vision tasks.
Contribution
The paper proposes a novel, general framework (FAR) for enhancing attributional robustness, along with two new instantiations, AAT and AdvAAT, that improve robustness and applicability.
Findings
FAR outperforms existing methods in attributional robustness on vision datasets.
AAT and AdvAAT effectively optimize for both robust attributions and accurate predictions.
The methods reduce dependency on certain training and estimation parameters.
Abstract
Attribution maps are popular tools for explaining neural networks predictions. By assigning an importance value to each input dimension that represents its impact towards the outcome, they give an intuitive explanation of the decision process. However, recent work has discovered vulnerability of these maps to imperceptible adversarial changes, which can prove critical in safety-relevant domains such as healthcare. Therefore, we define a novel generic framework for attributional robustness (FAR) as general problem formulation for training models with robust attributions. This framework consist of a generic regularization term and training objective that minimize the maximal dissimilarity of attribution maps in a local neighbourhood of the input. We show that FAR is a generalized, less constrained formulation of currently existing training methods. We then propose two new instantiations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning in Healthcare
