SaliencyDecor: Enhancing Neural Network Interpretability through Feature Decorrelation
Ali Karkehabadi, Jamshid Hassanpour, Houman Homayoun, Avesta Sasan

TL;DR
SaliencyDecor is a training framework that improves neural network interpretability by enforcing feature decorrelation, resulting in sharper saliency maps and better predictive accuracy without changing model architecture.
Contribution
It introduces a decorrelation regularizer during training to enhance saliency map quality and model performance, addressing limitations of gradient-based interpretability methods.
Findings
Sharper, more object-focused saliency maps produced
Improved predictive accuracy across multiple datasets
Enhanced interpretability without architectural changes
Abstract
Gradient-based saliency methods are widely used to interpret deep neural networks, yet they often produce noisy and unstable explanations that poorly align with semantically meaningful input features. We argue that a fundamental cause of this behavior lies in the geometry of learned representations: correlated feature dimensions diffuse attribution gradients across redundant directions, resulting in blurred and unreliable saliency maps. To address this issue, we identify feature correlation as a structural limitation of gradient-based interpretability and propose SaliencyDecor, a training framework that enforces feature decorrelation to improve attribution fidelity without modifying saliency methods or model architectures by reshaping the feature space toward orthogonality, our approach promotes more concentrated gradient flow and improves the fidelity of saliency-based explanations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
