Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing

Dipkamal Bhusal; Md Tanvirul Alam; Nidhi Rastogi

arXiv:2603.07302·cs.CV·March 10, 2026

Training for Trustworthy Saliency Maps: Adversarial Training Meets Feature-Map Smoothing

Dipkamal Bhusal, Md Tanvirul Alam, Nidhi Rastogi

PDF

Open Access

TL;DR

This paper introduces a training-based approach combining adversarial training with feature-map smoothing to produce more stable and trustworthy saliency maps for image classifiers, addressing noise and instability issues.

Contribution

It demonstrates that integrating feature-map smoothing into adversarial training enhances the stability and trustworthiness of saliency maps, a novel training-centered solution.

Findings

01

Sparser and more input-stable saliency maps from adversarial training.

02

Smoothing improves both input-side and output-side stability.

03

Human study confirms increased perceived trustworthiness.

Abstract

Gradient-based saliency methods such as Vanilla Gradient (VG) and Integrated Gradients (IG) are widely used to explain image classifiers, yet the resulting maps are often noisy and unstable, limiting their usefulness in high-stakes settings. Most prior work improves explanations by modifying the attribution algorithm, leaving open how the training procedure shapes explanation quality. We take a training-centered view and first provide a curvature-based analysis linking attribution stability to how smoothly the input-gradient field varies locally. Guided by this connection, we study adversarial training and identify a consistent trade-off: it yields sparser and more input-stable saliency maps, but can degrade output-side stability, causing explanations to change even when predictions remain unchanged and logits vary only slightly. To mitigate this, we propose augmenting adversarial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI