On Adversarial Examples for Text Classification by Perturbing Latent   Representations

Korn Sooksatra; Bikram Khanal; Pablo Rivas

arXiv:2405.03789·cs.LG·May 8, 2024

On Adversarial Examples for Text Classification by Perturbing Latent Representations

Korn Sooksatra, Bikram Khanal, Pablo Rivas

PDF

Open Access

TL;DR

This paper introduces a framework that assesses text classifier robustness by generating adversarial examples through perturbing latent embeddings rather than discrete inputs, highlighting vulnerabilities in deep learning models.

Contribution

It proposes a novel white-box attack method that manipulates embedding vectors to create adversarial texts, advancing understanding of model robustness in NLP.

Findings

01

Embedding perturbation effectively fools classifiers

02

White-box attacks outperform black-box methods

03

Framework provides a new robustness measurement tool

Abstract

Recently, with the advancement of deep learning, several applications in text classification have advanced significantly. However, this improvement comes with a cost because deep learning is vulnerable to adversarial examples. This weakness indicates that deep learning is not very robust. Fortunately, the input of a text classifier is discrete. Hence, it can prevent the classifier from state-of-the-art attacks. Nonetheless, previous works have generated black-box attacks that successfully manipulate the discrete values of the input to find adversarial examples. Therefore, instead of changing the discrete values, we transform the input into its embedding vector containing real values to perform the state-of-the-art white-box attacks. Then, we convert the perturbed embedding vector back into a text and name it an adversarial example. In summary, we create a framework that measures the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques