Graded Suspiciousness of Adversarial Texts to Human

Shakila Mahjabin Tonni; Pedro Faustini; Mark Dras

arXiv:2410.04377·cs.LG·January 24, 2025

Graded Suspiciousness of Adversarial Texts to Human

Shakila Mahjabin Tonni, Pedro Faustini, Mark Dras

PDF

Open Access

TL;DR

This paper investigates human perception of adversarial texts, introduces a dataset of human suspiciousness ratings, and develops a model to generate less suspicious adversarial examples for NLP systems.

Contribution

It provides the first dataset of human suspiciousness evaluations for adversarial texts and a regression model to quantify and reduce suspiciousness in adversarial NLP examples.

Findings

01

Human suspiciousness correlates with detection difficulty.

02

The regression model effectively predicts suspiciousness scores.

03

Incorporating suspiciousness scores improves adversarial text generation.

Abstract

Adversarial examples pose a significant challenge to deep neural networks (DNNs) across both image and text domains, with the intent to degrade model performance through meticulously altered inputs. Adversarial texts, however, are distinct from adversarial images due to their requirement for semantic similarity and the discrete nature of the textual contents. This study delves into the concept of human suspiciousness, a quality distinct from the traditional focus on imperceptibility found in image-based adversarial examples. Unlike images, where adversarial changes are meant to be indistinguishable to the human eye, textual adversarial content must often remain undetected or non-suspicious to human readers, even when the text's purpose is to deceive NLP systems or bypass filters. In this research, we expand the study of human suspiciousness by analyzing how individuals perceive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection

MethodsFocus