Input Bias in Rectified Gradients and Modified Saliency Maps
Lennart Brocki, Neo Christopher Chung

TL;DR
This paper investigates input bias in saliency maps generated by methods like Rectified Gradients, revealing that such biases can mislead interpretability, and proposes a simple modification to improve the true explainability of these visualizations.
Contribution
The paper identifies input bias issues in Rectified Gradients and similar methods and introduces a straightforward modification that enhances the interpretability of saliency maps.
Findings
Rectified Gradients introduce a strong input bias, such as brightness influence.
Dark image areas are underrepresented in saliency maps despite relevance.
Removing input feature multiplication reduces bias and aligns visualizations with true model explanations.
Abstract
Interpretation and improvement of deep neural networks relies on better understanding of their underlying mechanisms. In particular, gradients of classes or concepts with respect to the input features (e.g., pixels in images) are often used as importance scores or estimators, which are visualized in saliency maps. Thus, a family of saliency methods provide an intuitive way to identify input features with substantial influences on classifications or latent concepts. Several modifications to conventional saliency maps, such as Rectified Gradients and Layer-wise Relevance Propagation (LRP), have been introduced to allegedly denoise and improve interpretability. While visually coherent in certain cases, Rectified Gradients and other modified saliency maps introduce a strong input bias (e.g., brightness in the RGB space) because of inappropriate uses of the input features. We demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
