Exploring the Interplay of Interpretability and Robustness in Deep   Neural Networks: A Saliency-guided Approach

Amira Guesmi; Nishant Suresh Aswani; and Muhammad Shafique

arXiv:2405.06278·cs.CV·May 13, 2024

Exploring the Interplay of Interpretability and Robustness in Deep Neural Networks: A Saliency-guided Approach

Amira Guesmi, Nishant Suresh Aswani, and Muhammad Shafique

PDF

Open Access

TL;DR

This paper explores how Saliency-guided Training (SGT) can improve both the robustness and interpretability of deep neural networks, especially against adversarial attacks, by enhancing saliency map clarity and combining with adversarial training.

Contribution

It introduces SGT as a method to boost robustness and interpretability, and proposes a combined approach with adversarial training for superior defense against attacks.

Findings

01

SGT improves robustness against PGD attacks by 35% on MNIST.

02

SGT enhances saliency map quality, aiding interpretability.

03

Combined SGT and adversarial training yields greater robustness.

Abstract

Adversarial attacks pose a significant challenge to deploying deep learning models in safety-critical applications. Maintaining model robustness while ensuring interpretability is vital for fostering trust and comprehension in these models. This study investigates the impact of Saliency-guided Training (SGT) on model robustness, a technique aimed at improving the clarity of saliency maps to deepen understanding of the model's decision-making process. Experiments were conducted on standard benchmark datasets using various deep learning architectures trained with and without SGT. Findings demonstrate that SGT enhances both model robustness and interpretability. Additionally, we propose a novel approach combining SGT with standard adversarial training to achieve even greater robustness while preserving saliency map quality. Our strategy is grounded in the assumption that preserving salient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning