Evaluating Text Classification Robustness to Part-of-Speech Adversarial   Examples

Anahita Samadi; Allison Sullivan

arXiv:2408.08374·cs.CL·August 19, 2024

Evaluating Text Classification Robustness to Part-of-Speech Adversarial Examples

Anahita Samadi, Allison Sullivan

PDF

Open Access

TL;DR

This paper investigates the robustness of text classification models against adversarial examples, revealing biases in CNNs related to parts of speech that expose vulnerabilities in linguistic understanding.

Contribution

It identifies specific parts of speech that significantly impact CNN-based classifiers, highlighting a critical vulnerability in their linguistic processing.

Findings

01

CNNs show bias against certain parts of speech in review datasets

02

Part-of-speech tokens significantly influence classifier decisions

03

Vulnerabilities in CNN linguistic processing are uncovered

Abstract

As machine learning systems become more widely used, especially for safety critical applications, there is a growing need to ensure that these systems behave as intended, even in the face of adversarial examples. Adversarial examples are inputs that are designed to trick the decision making process, and are intended to be imperceptible to humans. However, for text-based classification systems, changes to the input, a string of text, are always perceptible. Therefore, text-based adversarial examples instead focus on trying to preserve semantics. Unfortunately, recent work has shown this goal is often not met. To improve the quality of text-based adversarial examples, we need to know what elements of the input text are worth focusing on. To address this, in this paper, we explore what parts of speech have the highest impact of text-based classifiers. Our experiments highlight a distinct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection

MethodsFocus