Claim-Guided Textual Backdoor Attack for Practical Applications

Minkyoo Song; Hanna Kim; Jaehan Kim; Youngjin Jin; Seungwon Shin

arXiv:2409.16618·cs.CL·September 26, 2024

Claim-Guided Textual Backdoor Attack for Practical Applications

Minkyoo Song, Hanna Kim, Jaehan Kim, Youngjin Jin, Seungwon Shin

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces Claim-Guided Backdoor Attack (CGBA), a novel method that exploits inherent textual claims as triggers to activate backdoors in language models without input manipulation, improving practicality and stealthiness.

Contribution

The paper presents a new backdoor attack method that uses textual claims as triggers, eliminating the need for post-distribution input manipulation, thus enhancing real-world applicability.

Findings

01

CGBA effectively triggers backdoors using claim-based triggers.

02

CGBA maintains model performance on clean data.

03

The attack demonstrates high stealthiness across datasets and models.

Abstract

Recent advances in natural language processing and the increased use of large language models have exposed new security vulnerabilities, such as backdoor attacks. Previous backdoor attacks require input manipulation after model distribution to activate the backdoor, posing limitations in real-world applicability. Addressing this gap, we introduce a novel Claim-Guided Backdoor Attack (CGBA), which eliminates the need for such manipulations by utilizing inherent textual claims as triggers. CGBA leverages claim extraction, clustering, and targeted training to trick models to misbehave on targeted claims without affecting their performance on clean data. CGBA demonstrates its effectiveness and stealthiness across various datasets and models, significantly enhancing the feasibility of practical backdoor attacks. Our code and data will be available at https://github.com/PaperCGBA/CGBA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

papercgba/cgba
pytorchOfficial

Datasets

roupenminassian/twitter-misinformation
dataset· 145 dl
145 dl

Videos

Claim-Guided Textual Backdoor Attack for Practical Applications· underline

Taxonomy

TopicsAdvanced Malware Detection Techniques · Security and Verification in Computing · Access Control and Trust