TL;DR
This paper introduces a statistically guaranteed, adversarial attack-based framework for discriminative feature localization in AI models, validated on image and ECG data, improving interpretability and identifying subtle, biologically plausible features.
Contribution
It develops a novel localization method with statistical guarantees, outperforming existing heuristic approaches in interpretability and feature detection.
Findings
Localized image regions are visually appealing and compact.
ECG features identified are biologically plausible and consistent with electrophysiology.
Method compares favorably with state-of-the-art techniques.
Abstract
In explainable artificial intelligence, discriminative feature localization is critical to reveal a blackbox model's decision-making process from raw data to prediction. In this article, we use two real datasets, the MNIST handwritten digits and MIT-BIH Electrocardiogram (ECG) signals, to motivate key characteristics of discriminative features, namely adaptiveness, predictive importance and effectiveness. Then, we develop a localization framework based on adversarial attacks to effectively localize discriminative features. In contrast to existing heuristic methods, we also provide a statistically guaranteed interpretability of the localized features by measuring a generalized partial . We apply the proposed method to the MNIST dataset and the MIT-BIH dataset with a convolutional auto-encoder. In the first, the compact image regions localized by the proposed method are visually…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLib
