Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial   Text Attacks

Piotr Gai\'nski; Klaudia Ba{\l}azy

arXiv:2302.05120·cs.CL·February 13, 2023

Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks

Piotr Gai\'nski, Klaudia Ba{\l}azy

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-step quantization method for gradient-based adversarial attacks on transformer-based language models, effectively bridging the gap between continuous and discrete text representations to improve attack success.

Contribution

It presents a novel multi-step quantization technique with a quantization-compensation loop, enhancing the effectiveness of adversarial attacks on NLP models.

Findings

01

Outperforms existing attack methods on multiple NLP tasks

02

Effectively bridges the gap between continuous and discrete text representations

03

Demonstrates significant improvements in attack success rates

Abstract

We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of token probabilities. Our algorithm mitigates the gap between adversarial loss for continuous and discrete text representations by performing multi-step quantization in a quantization-compensation loop. Experiments show that our method significantly outperforms other approaches on various natural language processing (NLP) tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gmum/mango
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Misinformation and Its Impacts