Noise Modulation: Let Your Model Interpret Itself
Haoyang Li, Xinggang Wang

TL;DR
This paper introduces noise modulation, a novel, efficient, and model-agnostic method inspired by amplitude modulation to enhance the interpretability of deep neural networks through input-gradient analysis.
Contribution
It proposes noise modulation as a new approach that approximates adversarial perturbations, improving interpretability without the inefficiencies of adversarial training.
Findings
Noise modulation effectively increases interpretability of input-gradients.
It is model-agnostic and more efficient than adversarial training.
Experimental results validate the method's effectiveness.
Abstract
Given the great success of Deep Neural Networks(DNNs) and the black-box nature of it,the interpretability of these models becomes an important issue.The majority of previous research works on the post-hoc interpretation of a trained model.But recently, adversarial training shows that it is possible for a model to have an interpretable input-gradient through training.However,adversarial training lacks efficiency for interpretability.To resolve this problem, we construct an approximation of the adversarial perturbations and discover a connection between adversarial training and amplitude modulation. Based on a digital analogy,we propose noise modulation as an efficient and model-agnostic alternative to train a model that interprets itself with input-gradients.Experiment results show that noise modulation can effectively increase the interpretability of input-gradients model-agnosticly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
