PETA: Parameter-Efficient Trojan Attacks

Lauren Hong; Ting Wang

arXiv:2310.00648·cs.CL·April 1, 2024

PETA: Parameter-Efficient Trojan Attacks

Lauren Hong, Ting Wang

PDF

Open Access

TL;DR

PETA introduces a novel trojan attack targeting parameter-efficient fine-tuning of language models, effectively embedding backdoors that survive downstream adaptation, raising security concerns for PEFT methods.

Contribution

The paper presents PETA, the first trojan attack specifically designed for PEFT, using bilevel optimization to embed persistent backdoors into PLMs.

Findings

01

PETA achieves high attack success rates across various tasks.

02

The backdoor remains effective even without full knowledge of the training process.

03

PETA maintains high clean accuracy while embedding the backdoor.

Abstract

Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT achieves performance that is comparable to standard fine-tuning. However, despite its prevalent use, the security implications of PEFT remain largely unexplored. In this paper, we take the initial steps and present PETA, a novel trojan attack that compromises the weights of PLMs by accounting for downstream adaptation through bilevel optimization: the upper-level objective embeds the backdoor into a model while the lower-level objective simulates PEFT to both retain the PLM's task-specific performance and ensure that the backdoor persists after fine-tuning. With extensive evaluation across a variety of downstream tasks and trigger designs, we demonstrate PETA's effectiveness in terms of both attack success…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Neuroscience and Neural Engineering · Low-power high-performance VLSI design