Experiments with adversarial attacks on text genres
Mikhail Lepekhin, Serge Sharoff

TL;DR
This paper investigates the robustness of transformer-based NLP models for genre classification by testing various adversarial attack techniques, revealing their vulnerabilities and limitations.
Contribution
It introduces an analysis of adversarial attack methods on genre classifiers, highlighting the effectiveness of embedding-based attacks like TextFooler over simple keyword replacements.
Findings
Embedding-based attacks can significantly alter model predictions.
Simple keyword-based attacks are less effective against transformer models.
Transformer models exhibit vulnerabilities to certain adversarial techniques.
Abstract
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification, such as genre identification. However, often these approaches exhibit low reliability to minor alterations of the test texts. A related probelm concerns topical biases in the training corpus, for example, the prevalence of words on a specific topic in a specific genre can trick the genre classifier to recognise any text on this topic in this genre. In order to mitigate the reliability problem, this paper investigates techniques for attacking genre classifiers to understand the limitations of the transformer models and to improve their performance. While simple text attacks, such as those based on word replacement using keywords extracted by tf-idf, are not capable of deceiving powerful models like XLM-RoBERTa, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · Attention Dropout · Softmax · Dense Connections · WordPiece
