Towards Robust and Accurate Stability Estimation of Local Surrogate   Models in Text-based Explainable AI

Christopher Burger; Charles Walter; Thai Le; Lingwei Chen

arXiv:2501.02042·cs.LG·January 7, 2025

Towards Robust and Accurate Stability Estimation of Local Surrogate Models in Text-based Explainable AI

Christopher Burger, Charles Walter, Thai Le, Lingwei Chen

PDF

Open Access

TL;DR

This paper evaluates the robustness of local surrogate models in text-based explainable AI against adversarial attacks, revealing that many similarity measures are overly sensitive and proposing a weighting scheme to improve stability estimation.

Contribution

It systematically compares similarity measures for text explanations and introduces a weighting scheme that accounts for feature synonymity to enhance robustness assessment.

Findings

01

Many similarity measures are overly sensitive, leading to inaccurate stability estimates.

02

The proposed weighting scheme improves the accuracy of adversarial robustness evaluation.

03

The study highlights the importance of appropriate similarity measures in XAI robustness analysis.

Abstract

Recent work has investigated the concept of adversarial attacks on explainable AI (XAI) in the NLP domain with a focus on examining the vulnerability of local surrogate methods such as Lime to adversarial perturbations or small changes on the input of a machine learning (ML) model. In such attacks, the generated explanation is manipulated while the meaning and structure of the original input remain similar under the ML model. Such attacks are especially alarming when XAI is used as a basis for decision making (e.g., prescribing drugs based on AI medical predictors) or for legal action (e.g., legal dispute involving AI software). Although weaknesses across many XAI methods have been shown to exist, the reasons behind why remain little explored. Central to this XAI manipulation is the similarity measure used to calculate how one explanation differs from another. A poor choice of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling

MethodsLocal Interpretable Model-Agnostic Explanations · Focus