Claim-Guided Textual Backdoor Attack for Practical Applications
Minkyoo Song, Hanna Kim, Jaehan Kim, Youngjin Jin, Seungwon Shin

TL;DR
This paper introduces Claim-Guided Backdoor Attack (CGBA), a novel method that exploits inherent textual claims as triggers to activate backdoors in language models without input manipulation, improving practicality and stealthiness.
Contribution
The paper presents a new backdoor attack method that uses textual claims as triggers, eliminating the need for post-distribution input manipulation, thus enhancing real-world applicability.
Findings
CGBA effectively triggers backdoors using claim-based triggers.
CGBA maintains model performance on clean data.
The attack demonstrates high stealthiness across datasets and models.
Abstract
Recent advances in natural language processing and the increased use of large language models have exposed new security vulnerabilities, such as backdoor attacks. Previous backdoor attacks require input manipulation after model distribution to activate the backdoor, posing limitations in real-world applicability. Addressing this gap, we introduce a novel Claim-Guided Backdoor Attack (CGBA), which eliminates the need for such manipulations by utilizing inherent textual claims as triggers. CGBA leverages claim extraction, clustering, and targeted training to trick models to misbehave on targeted claims without affecting their performance on clean data. CGBA demonstrates its effectiveness and stealthiness across various datasets and models, significantly enhancing the feasibility of practical backdoor attacks. Our code and data will be available at https://github.com/PaperCGBA/CGBA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Access Control and Trust
