Turning Generative Models Degenerate: The Power of Data Poisoning   Attacks

Shuli Jiang; Swanand Ravindra Kadhe; Yi Zhou; Farhan Ahmed; Ling Cai,; Nathalie Baracaldo

arXiv:2407.12281·cs.CR·July 19, 2024

Turning Generative Models Degenerate: The Power of Data Poisoning Attacks

Shuli Jiang, Swanand Ravindra Kadhe, Yi Zhou, Farhan Ahmed, Ling Cai,, Nathalie Baracaldo

PDF

Open Access

TL;DR

This paper investigates how data poisoning attacks can compromise large language models during fine-tuning, revealing vulnerabilities and the ineffectiveness of current defenses in natural language generation tasks.

Contribution

It provides the first systematic analysis of poisoning attacks on NLG models during PEFT fine-tuning, highlighting key factors influencing attack success and stealthiness.

Findings

01

Prefix-tuning hyperparameters are critical for attack effectiveness.

02

Existing defenses are ineffective against these poisoning attacks.

03

New metrics effectively quantify attack success and stealthiness.

Abstract

The increasing use of large language models (LLMs) trained by third parties raises significant security concerns. In particular, malicious actors can introduce backdoors through poisoning attacks to generate undesirable outputs. While such attacks have been extensively studied in image domains and classification tasks, they remain underexplored for natural language generation (NLG) tasks. To address this gap, we conduct an investigation of various poisoning techniques targeting the LLM's fine-tuning phase via prefix-tuning, a Parameter Efficient Fine-Tuning (PEFT) method. We assess their effectiveness across two generative tasks: text summarization and text completion; and we also introduce new metrics to quantify the success and stealthiness of such NLG poisoning attacks. Through our experiments, we find that the prefix-tuning hyperparameters and trigger designs are the most crucial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Information and Cyber Security