Identifying Human Strategies for Generating Word-Level Adversarial   Examples

Maximilian Mozes; Bennett Kleinberg; Lewis D. Griffin

arXiv:2210.11598·cs.CL·October 24, 2022

Identifying Human Strategies for Generating Word-Level Adversarial Examples

Maximilian Mozes, Bennett Kleinberg, Lewis D. Griffin

PDF

Open Access

TL;DR

This paper analyzes how humans generate word-level adversarial examples in NLP, revealing behavioral patterns and preferences that can inform the development of more robust models.

Contribution

It provides a detailed analysis of human strategies in creating adversarial examples, highlighting key tendencies and decision patterns during the process.

Findings

01

Humans prefer to replace words based on frequency, saliency, and sentiment.

02

Humans tend to replace words at specific positions in the input sequence.

03

Identified statistically significant patterns in human adversarial generation strategies.

Abstract

Adversarial examples in NLP are receiving increasing research attention. One line of investigation is the generation of word-level adversarial examples against fine-tuned Transformer models that preserve naturalness and grammaticality. Previous work found that human- and machine-generated adversarial examples are comparable in their naturalness and grammatical correctness. Most notably, humans were able to generate adversarial examples much more effortlessly than automated attacks. In this paper, we provide a detailed analysis of exactly how humans create these adversarial examples. By exploring the behavioural patterns of human workers during the generation process, we identify statistically significant tendencies based on which words humans prefer to select for adversarial replacement (e.g., word frequencies, word saliencies, sentiment) as well as where and when words are replaced in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Label Smoothing · Absolute Position Encodings · Layer Normalization · Byte Pair Encoding · Residual Connection