Defensive Dual Masking for Robust Adversarial Defense

Wangli Yang; Jie Yang; Yi Guo; Johan Barthelemy

arXiv:2412.07078·cs.CL·December 11, 2024

Defensive Dual Masking for Robust Adversarial Defense

Wangli Yang, Jie Yang, Yi Guo, Johan Barthelemy

PDF

Open Access

TL;DR

The paper presents Defensive Dual Masking (DDM), a novel adversarial defense method for NLP models that uses strategic masking during training and inference to improve robustness against adversarial attacks.

Contribution

Introduces DDM, a new adversarial defense technique employing strategic masking during training and inference to enhance NLP model robustness.

Findings

01

DDM outperforms existing defenses on multiple benchmarks.

02

DDM improves robustness of Large Language Models against adversarial attacks.

03

Empirical results show increased accuracy and resilience.

Abstract

The field of textual adversarial defenses has gained considerable attention in recent years due to the increasing vulnerability of natural language processing (NLP) models to adversarial attacks, which exploit subtle perturbations in input text to deceive models. This paper introduces the Defensive Dual Masking (DDM) algorithm, a novel approach designed to enhance model robustness against such attacks. DDM utilizes a unique adversarial training strategy where [MASK] tokens are strategically inserted into training samples to prepare the model to handle adversarial perturbations more effectively. During inference, potentially adversarial tokens are dynamically replaced with [MASK] tokens to neutralize potential threats while preserving the core semantics of the input. The theoretical foundation of our approach is explored, demonstrating how the selective masking mechanism strengthens the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training