MPAT: Building Robust Deep Neural Networks against Textual Adversarial   Attacks

Fangyuan Zhang; Huichi Zhou; Shuangjiao Li; Hongtao Wang

arXiv:2402.18792·cs.LG·March 1, 2024·1 cites

MPAT: Building Robust Deep Neural Networks against Textual Adversarial Attacks

Fangyuan Zhang, Huichi Zhou, Shuangjiao Li, Hongtao Wang

PDF

Open Access

TL;DR

This paper introduces MPAT, a novel adversarial training approach that enhances the robustness of deep neural networks against textual adversarial attacks without sacrificing task performance.

Contribution

The paper proposes a multi-level malicious perturbation generation strategy and a new training objective to improve defense effectiveness and maintain original task accuracy.

Findings

01

Outperforms previous defenses against malicious adversarial attacks

02

Maintains or improves original task performance

03

Effective across multiple models and datasets

Abstract

Deep neural networks have been proven to be vulnerable to adversarial examples and various methods have been proposed to defend against adversarial attacks for natural language processing tasks. However, previous defense methods have limitations in maintaining effective defense while ensuring the performance of the original task. In this paper, we propose a malicious perturbation based adversarial training method (MPAT) for building robust deep neural networks against textual adversarial attacks. Specifically, we construct a multi-level malicious example generation strategy to generate adversarial examples with malicious perturbations, which are used instead of original inputs for model training. Additionally, we employ a novel training objective function to ensure achieving the defense goal without compromising the performance on the original task. We conduct comprehensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection