Transferring Backdoors between Large Language Models by Knowledge   Distillation

Pengzhou Cheng; Zongru Wu; Tianjie Ju; Wei Du; Zhuosheng Zhang; Gongshen Liu

arXiv:2408.09878·cs.CR·August 20, 2024

Transferring Backdoors between Large Language Models by Knowledge Distillation

Pengzhou Cheng, Zongru Wu, Tianjie Ju, Wei Du, Zhuosheng Zhang, Gongshen Liu

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that backdoor vulnerabilities in large language models can be transferred to smaller models through knowledge distillation, highlighting a significant security risk in model transferability.

Contribution

It introduces ATBA, a novel adaptive backdoor transfer attack method that effectively distills backdoor knowledge into small models via knowledge distillation.

Findings

01

Over 80% backdoor transferability in experiments.

02

ATBA effectively generates positive guidance for student models.

03

The attack is robust and stealthy.

Abstract

Backdoor Attacks have been a serious vulnerability against Large Language Models (LLMs). However, previous methods only reveal such risk in specific models, or present tasks transferability after attacking the pre-trained phase. So, how risky is the model transferability of a backdoor attack? In this paper, we focus on whether existing mini-LLMs may be unconsciously instructed in backdoor knowledge by poisoned teacher LLMs through knowledge distillation (KD). Specifically, we propose ATBA, an adaptive transferable backdoor attack, which can effectively distill the backdoor of teacher LLMs into small models when only executing clean-tuning. We first propose the Target Trigger Generation (TTG) module that filters out a set of indicative trigger candidates from the token list based on cosine similarity distribution. Then, we exploit a shadow model to imitate the distilling process and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhou-cybersecurity-ai/atba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques