CLIBE: Detecting Dynamic Backdoors in Transformer-based NLP Models
Rui Zeng, Xi Chen, Yuwen Pu, Xuhong Zhang, Tianyu Du, Shouling Ji

TL;DR
CLIBE is a novel framework that detects dynamic backdoors in Transformer-based NLP models by injecting optimized weight perturbations and analyzing model behavior, effectively identifying stealthy backdoor triggers in various NLP tasks.
Contribution
This work introduces CLIBE, the first method to detect dynamic backdoors in NLP models through weight perturbation, addressing a gap in existing static backdoor detection techniques.
Findings
CLIBE effectively detects dynamic backdoors across multiple attack types and models.
It demonstrates robustness against adaptive attacks and real-world models.
CLIBE can identify backdoors in text generation models without trigger samples.
Abstract
Backdoors can be injected into NLP models to induce misbehavior when the input text contains a specific feature, known as a trigger, which the attacker secretly selects. Unlike fixed words, phrases, or sentences used in the static text trigger, NLP dynamic backdoor attacks design triggers associated with abstract and latent text features, making them considerably stealthier than traditional static backdoor attacks. However, existing research on NLP backdoor detection primarily focuses on defending against static backdoor attacks, while detecting dynamic backdoors in NLP models remains largely unexplored. This paper presents CLIBE, the first framework to detect dynamic backdoors in Transformer-based NLP models. CLIBE injects a "few-shot perturbation" into the suspect Transformer model by crafting optimized weight perturbation in the attention layers to make the perturbed model classify a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsByte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Linear Layer · Adam · Dropout · Layer Normalization · Dense Connections · Attention Is All You Need
