Adversarial Text Normalization

Joanna Bitton; Maya Pavlova; Ivan Evtimov

arXiv:2206.04137·cs.CL·June 10, 2022

Adversarial Text Normalization

Joanna Bitton, Maya Pavlova, Ivan Evtimov

PDF

Open Access

TL;DR

This paper introduces the Adversarial Text Normalizer, a lightweight method to defend against character-level adversarial attacks in text, improving robustness in tasks like hate speech detection and natural language inference.

Contribution

The paper presents a novel, low-overhead text normalization technique that enhances model robustness against character-level adversarial attacks across multiple NLP tasks.

Findings

01

Normalizer restores baseline performance on attacked data

02

Effective as a task-agnostic defense mechanism

03

Complementary to adversarial retraining methods

Abstract

Text-based adversarial attacks are becoming more commonplace and accessible to general internet users. As these attacks proliferate, the need to address the gap in model robustness becomes imminent. While retraining on adversarial data may increase performance, there remains an additional class of character-level attacks on which these models falter. Additionally, the process to retrain a model is time and resource intensive, creating a need for a lightweight, reusable defense. In this work, we propose the Adversarial Text Normalizer, a novel method that restores baseline performance on attacked content with low computational overhead. We evaluate the efficacy of the normalizer on two problem areas prone to adversarial attacks, i.e. Hate Speech and Natural Language Inference. We find that text normalization provides a task-agnostic defense against character-level attacks that can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Adversarial Robustness in Machine Learning