Improving Gradient-based Adversarial Training for Text Classification by Contrastive Learning and Auto-Encoder
Yao Qiu, Jinchao Zhang, Jie Zhou

TL;DR
This paper introduces two novel adversarial training methods, CARL and RAR, that enhance the robustness of text classification models against gradient-based adversarial attacks by improving representation learning and reconstruction.
Contribution
The paper proposes two new adversarial training approaches, CARL and RAR, that improve model robustness and efficiency in defending against gradient-based adversarial attacks in text classification.
Findings
Both approaches outperform strong baselines on various datasets.
Semantic representations are less affected by adversarial perturbations.
RAR can generate text-form adversarial samples.
Abstract
Recent work has proposed several efficient approaches for generating gradient-based adversarial perturbations on embeddings and proved that the model's performance and robustness can be improved when they are trained with these contaminated embeddings. While they paid little attention to how to help the model to learn these adversarial samples more efficiently. In this work, we focus on enhancing the model's ability to defend gradient-based adversarial attack during the model's training process and propose two novel adversarial training approaches: (1) CARL narrows the original sample and its adversarial sample in the representation space while enlarging their distance from different labeled samples. (2) RAR forces the model to reconstruct the original sample from its adversarial representation. Experiments show that the proposed two approaches outperform strong baselines on various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling
