TrojText: Test-time Invisible Textual Trojan Insertion
Qian Lou, Yepeng Liu, Bo Feng

TL;DR
TrojText introduces an efficient method for performing invisible textual Trojan attacks in NLP models without requiring large training datasets, using a novel attack algorithm and optimization techniques.
Contribution
The paper presents TrojText, a new approach that enables test-time Trojan insertion in NLP models using small test samples, reducing data and computational requirements.
Findings
Achieved 98.35% accuracy in target class classification on BERT for AG's News.
Demonstrated effectiveness across three datasets and three NLP models.
Reduced attack overhead with AGR and TWP techniques.
Abstract
In Natural Language Processing (NLP), intelligent neuron models can be susceptible to textual Trojan attacks. Such attacks occur when Trojan models behave normally for standard inputs but generate malicious output for inputs that contain a specific trigger. Syntactic-structure triggers, which are invisible, are becoming more popular for Trojan attacks because they are difficult to detect and defend against. However, these types of attacks require a large corpus of training data to generate poisoned samples with the necessary syntactic structures for Trojan insertion. Obtaining such data can be difficult for attackers, and the process of generating syntactic poisoned triggers and inserting Trojans can be time-consuming. This paper proposes a solution called TrojText, which aims to determine whether invisible textual Trojan attacks can be performed more efficiently and cost-effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Software Engineering Research
MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Test · Byte Pair Encoding · Linear Layer · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Adam
