Certified Interpretability Robustness for Class Activation Mapping
Alex Gu, Tsui-Wei Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel

TL;DR
This paper introduces CORGI, a novel algorithm providing certifiable robustness guarantees for interpretability maps in deep learning models, specifically applied to traffic sign recognition, enhancing trustworthiness of model explanations.
Contribution
The paper proposes CORGI, the first method to certify interpretability robustness bounds for CAM maps in deep networks, addressing a key gap in model interpretability safety.
Findings
CORGI certifies lower bounds on adversarial perturbations for interpretability maps.
On traffic sign data, CORGI's bounds are close to state-of-the-art attack methods.
The approach improves confidence in interpretability robustness for autonomous driving applications.
Abstract
Interpreting machine learning models is challenging but crucial for ensuring the safety of deep networks in autonomous driving systems. Due to the prevalence of deep learning based perception models in autonomous vehicles, accurately interpreting their predictions is crucial. While a variety of such methods have been proposed, most are shown to lack robustness. Yet, little has been done to provide certificates for interpretability robustness. Taking a step in this direction, we present CORGI, short for Certifiably prOvable Robustness Guarantees for Interpretability mapping. CORGI is an algorithm that takes in an input image and gives a certifiable lower bound for the robustness of the top k pixels of its CAM interpretability map. We show the effectiveness of CORGI via a case study on traffic sign data, certifying lower bounds on the minimum adversarial perturbation not far from (4-5x)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Cardiac Arrest and Resuscitation
MethodsClass-activation map
