Defense Against Syntactic Textual Backdoor Attacks with Token   Substitution

Xinglin Li; Xianwen He; Yao Li; Minhao Cheng

arXiv:2407.04179·cs.CL·July 8, 2024·1 cites

Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Xinglin Li, Xianwen He, Yao Li, Minhao Cheng

PDF

Open Access

TL;DR

This paper introduces an online defense algorithm that detects and mitigates both syntax-based and token-based textual backdoor attacks in Large Language Models by comparing model predictions before and after word substitutions.

Contribution

It presents a novel method that effectively counters syntax-based backdoor triggers, addressing a gap in existing defenses focused mainly on token-based triggers.

Findings

01

Effective against syntax-based triggers

02

Robust detection of token-based triggers

03

Maintains model integrity under attack

Abstract

Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed. To fill this gap, this paper proposes a novel online defense algorithm that effectively counters syntax-based as well as special token-based backdoor attacks. The algorithm replaces semantically meaningful words in sentences with entirely different ones but preserves the syntactic templates or special tokens, and then compares the predicted labels before and after the substitution to determine whether a sentence contains triggers. Experimental results confirm the algorithm's performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Advanced Malware Detection Techniques