Turning Generative Models Degenerate: The Power of Data Poisoning Attacks
Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai,, Nathalie Baracaldo

TL;DR
This paper investigates how data poisoning attacks can compromise large language models during fine-tuning, revealing vulnerabilities and the ineffectiveness of current defenses in natural language generation tasks.
Contribution
It provides the first systematic analysis of poisoning attacks on NLG models during PEFT fine-tuning, highlighting key factors influencing attack success and stealthiness.
Findings
Prefix-tuning hyperparameters are critical for attack effectiveness.
Existing defenses are ineffective against these poisoning attacks.
New metrics effectively quantify attack success and stealthiness.
Abstract
The increasing use of large language models (LLMs) trained by third parties raises significant security concerns. In particular, malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. While such attacks have been extensively studied in image domains and classification tasks, they remain underexplored for natural language generation (NLG) tasks. To address this gap, we conduct an investigation of various poisoning techniques targeting the LLM's fine-tuning phase via prefix-tuning, a Parameter Efficient Fine-Tuning (PEFT) method. We assess their effectiveness across two generative tasks: text summarization and text completion; and we also introduce new metrics to quantify the success and stealthiness of such NLG poisoning attacks. Through our experiments, we find that the prefix-tuning hyperparameters and trigger designs are the most crucial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Information and Cyber Security
