Robust Ranking Explanations
Chao Chen, Chenghua Guo, Guixiang Ma, Ming Zeng, Xi Zhang, Sihong Xie

TL;DR
This paper introduces R2ET, a novel algorithm that enhances the robustness of top salient feature explanations in machine learning models against adversarial attacks, ensuring more trustworthy interpretability.
Contribution
It defines explanation thickness and develops a surrogate bound to maximize it, connecting robustness with adversarial training, and demonstrates effectiveness across various models and data types.
Findings
R2ET improves explanation robustness under stealthy attacks.
It maintains model accuracy while enhancing interpretability.
Theoretical connection between R2ET and adversarial training.
Abstract
Robust explanations of machine learning models are critical to establish human trust in the models. Due to limited cognition capability, most humans can only interpret the top few salient features. It is critical to make top salient features robust to adversarial attacks, especially those against the more vulnerable gradient-based explanations. Existing defense measures robustness using -norms, which have weaker protection power. We define explanation thickness for measuring salient features ranking stability, and derive tractable surrogate bounds of the thickness to design the \textit{R2ET} algorithm to efficiently maximize the thickness and anchor top salient features. Theoretically, we prove a connection between R2ET and adversarial training. Experiments with a wide spectrum of network architectures and data modalities, including brain networks, demonstrate that R2ET attains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
