TrojText: Test-time Invisible Textual Trojan Insertion

Qian Lou; Yepeng Liu; Bo Feng

arXiv:2303.02242·cs.CL·August 23, 2023·5 cites

TrojText: Test-time Invisible Textual Trojan Insertion

Qian Lou, Yepeng Liu, Bo Feng

PDF

Open Access 1 Repo 1 Video

TL;DR

TrojText introduces an efficient method for performing invisible textual Trojan attacks in NLP models without requiring large training datasets, using a novel attack algorithm and optimization techniques.

Contribution

The paper presents TrojText, a new approach that enables test-time Trojan insertion in NLP models using small test samples, reducing data and computational requirements.

Findings

01

Achieved 98.35% accuracy in target class classification on BERT for AG's News.

02

Demonstrated effectiveness across three datasets and three NLP models.

03

Reduced attack overhead with AGR and TWP techniques.

Abstract

In Natural Language Processing (NLP), intelligent neuron models can be susceptible to textual Trojan attacks. Such attacks occur when Trojan models behave normally for standard inputs but generate malicious output for inputs that contain a specific trigger. Syntactic-structure triggers, which are invisible, are becoming more popular for Trojan attacks because they are difficult to detect and defend against. However, these types of attacks require a large corpus of training data to generate poisoned samples with the necessary syntactic structures for Trojan insertion. Obtaining such data can be difficult for attackers, and the process of generating syntactic poisoned triggers and inserting Trojans can be time-consuming. This paper proposes a solution called TrojText, which aims to determine whether invisible textual Trojan attacks can be performed more efficiently and cost-effectively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucf-ml-research/trojtext
pytorchOfficial

Videos

TrojText: Test-time Invisible Textual Trojan Insertion· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Software Engineering Research

MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Test · Byte Pair Encoding · Linear Layer · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Adam