Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Prathyusha Devabhakthini; Sasmita Parida; Raj Mani Shukla; Suvendu Chandan Nayak; Tapadhir Das

arXiv:2307.08327·cs.LG·September 15, 2025

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

Prathyusha Devabhakthini, Sasmita Parida, Raj Mani Shukla, Suvendu Chandan Nayak, Tapadhir Das

PDF

Open Access

TL;DR

This paper investigates how adversarial attacks affect the interpretability of machine learning models in text classification, highlighting the vulnerability of explainability to malicious input modifications.

Contribution

It introduces a method to analyze the impact of adversarial perturbations on model explainability in text classification tasks.

Findings

01

Adversarial attacks significantly alter model explanations.

02

Model performance degrades after adversarial perturbations.

03

Explainability metrics change notably post-attack.

Abstract

Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions. Adversarial attacks can have serious consequences, particularly in applications such as autonomous vehicles, medical diagnosis, and security systems. Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to. In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems. We develop an ML-based classification model for text data. Then, we introduce the adversarial perturbations on the text data to understand the classification performance after the attack. Subsequently, we analyze and interpret the model's explainability before and after the attack

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications