A Survey of Recent Backdoor Attacks and Defenses in Large Language   Models

Shuai Zhao; Meihuizi Jia; Zhongliang Guo; Leilei Gan; Xiaoyu Xu,; Xiaobao Wu; Jie Fu; Yichao Feng; Fengjun Pan; Luu Anh Tuan

arXiv:2406.06852·cs.CR·January 7, 2025

A Survey of Recent Backdoor Attacks and Defenses in Large Language Models

Shuai Zhao, Meihuizi Jia, Zhongliang Guo, Leilei Gan, Xiaoyu Xu,, Xiaobao Wu, Jie Fu, Yichao Feng, Fengjun Pan, Luu Anh Tuan

PDF

Open Access

TL;DR

This survey reviews recent backdoor attack methods on large language models, focusing on fine-tuning techniques, and discusses future research directions for more covert and versatile attacks.

Contribution

It provides a systematic classification of backdoor attacks on LLMs based on fine-tuning approaches and highlights research gaps for future exploration.

Findings

01

Classifies backdoor attacks into three categories based on fine-tuning methods

02

Identifies key challenges and open issues in backdoor attack research for LLMs

03

Highlights the need for more covert and fine-tuning-free attack algorithms

Abstract

Large Language Models (LLMs), which bridge the gap between human language understanding and complex problem-solving, achieve state-of-the-art performance on several NLP tasks, particularly in few-shot and zero-shot settings. Despite the demonstrable efficacy of LLMs, due to constraints on computational resources, users have to engage with open-source language models or outsource the entire training process to third-party platforms. However, research has demonstrated that language models are susceptible to potential security vulnerabilities, particularly in backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into language models by poisoning training samples or model weights, allowing attackers to manipulate model responses through malicious triggers. While existing surveys on backdoor attacks provide a comprehensive overview, they lack an in-depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques