Step by Step Loss Goes Very Far: Multi-Step Quantization for Adversarial Text Attacks
Piotr Gai\'nski, Klaudia Ba{\l}azy

TL;DR
This paper introduces a multi-step quantization method for gradient-based adversarial attacks on transformer-based language models, effectively bridging the gap between continuous and discrete text representations to improve attack success.
Contribution
It presents a novel multi-step quantization technique with a quantization-compensation loop, enhancing the effectiveness of adversarial attacks on NLP models.
Findings
Outperforms existing attack methods on multiple NLP tasks
Effectively bridges the gap between continuous and discrete text representations
Demonstrates significant improvements in attack success rates
Abstract
We propose a novel gradient-based attack against transformer-based language models that searches for an adversarial example in a continuous space of token probabilities. Our algorithm mitigates the gap between adversarial loss for continuous and discrete text representations by performing multi-step quantization in a quantization-compensation loop. Experiments show that our method significantly outperforms other approaches on various natural language processing (NLP) tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Misinformation and Its Impacts
