Attacking interpretable NLP systems

Eldor Abdukhamidov; Tamer Abuhmed; Joanna C. S. Santos; Mohammed Abuhamad

arXiv:2507.16164·cs.CR·July 23, 2025

Attacking interpretable NLP systems

Eldor Abdukhamidov, Tamer Abuhmed, Joanna C. S. Santos, Mohammed Abuhamad

PDF

Open Access

TL;DR

This paper presents AdvChar, a black-box attack method that subtly modifies text inputs to deceive interpretable NLP systems while maintaining semantic similarity and interpretation, exposing vulnerabilities in model trust.

Contribution

Introduces AdvChar, a novel character-level attack that effectively misleads interpretable NLP models with minimal text modifications, highlighting security concerns.

Findings

01

AdvChar reduces model accuracy significantly with minimal text changes.

02

The attack maintains high similarity between original and adversarial inputs.

03

It successfully fools multiple NLP and interpretation models.

Abstract

Studies have shown that machine learning systems are vulnerable to adversarial examples in theory and practice. Where previous attacks have focused mainly on visual models that exploit the difference between human and machine perception, text-based models have also fallen victim to these attacks. However, these attacks often fail to maintain the semantic meaning of the text and similarity. This paper introduces AdvChar, a black-box attack on Interpretable Natural Language Processing Systems, designed to mislead the classifier while keeping the interpretation similar to benign inputs, thus exploiting trust in system transparency. AdvChar achieves this by making less noticeable modifications to text input, forcing the deep learning classifier to make incorrect predictions and preserve the original interpretation. We use an interpretation-focused scoring approach to determine the most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI