Feature Perturbation Augmentation for Reliable Evaluation of Importance Estimators in Neural Networks
Lennart Brocki, Neo Christopher Chung

TL;DR
This paper introduces feature perturbation augmentation (FPA), a data augmentation method that improves the robustness and evaluation of importance estimators in neural networks by mitigating perturbation artifacts.
Contribution
FPA is a novel data augmentation technique that enhances the reliability of importance estimator evaluation in neural networks.
Findings
FPA makes neural networks more robust against feature perturbations.
Training with FPA reveals that importance score signs can better explain model behavior.
FPA improves the evaluation process of post-hoc interpretability methods.
Abstract
Post-hoc explanation methods attempt to make the inner workings of deep neural networks more interpretable. However, since a ground truth is in general lacking, local post-hoc interpretability methods, which assign importance scores to input features, are challenging to evaluate. One of the most popular evaluation frameworks is to perturb features deemed important by an interpretability method and to measure the change in prediction accuracy. Intuitively, a large decrease in prediction accuracy would indicate that the explanation has correctly quantified the importance of features with respect to the prediction outcome (e.g., logits). However, the change in the prediction outcome may stem from perturbation artifacts, since perturbed samples in the test dataset are out of distribution (OOD) compared to the training dataset and can therefore potentially disturb the model in an unexpected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsTest
