Natural Language Adversarial Defense through Synonym Encoding
Xiaosen Wang, Hao Jin, Yichen Yang, Kun He

TL;DR
This paper introduces SEM, a novel defense method against synonym substitution attacks in NLP, by encoding synonyms to improve model robustness without altering architecture or requiring extra data.
Contribution
The paper proposes SEM, a new synonym encoding defense technique that effectively counters synonym-based adversarial attacks in NLP models.
Findings
SEM defends against current synonym substitution attacks
SEM blocks transferability of adversarial examples
SEM scales efficiently to large models and datasets
Abstract
In the area of natural language processing, deep learning models are recently known to be vulnerable to various types of adversarial perturbations, but relatively few works are done on the defense side. Especially, there exists few effective defense method against the successful synonym substitution based attacks that preserve the syntactic structure and semantic information of the original text while fooling the deep learning models. We contribute in this direction and propose a novel adversarial defense method called Synonym Encoding Method (SEM). Specifically, SEM inserts an encoder before the input layer of the target model to map each cluster of synonyms to a unique encoding and trains the model to eliminate possible adversarial perturbations without modifying the network architecture or adding extra data. Extensive experiments demonstrate that SEM can effectively defend the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques
