PETA: Parameter-Efficient Trojan Attacks
Lauren Hong, Ting Wang

TL;DR
PETA introduces a novel trojan attack targeting parameter-efficient fine-tuning of language models, effectively embedding backdoors that survive downstream adaptation, raising security concerns for PEFT methods.
Contribution
The paper presents PETA, the first trojan attack specifically designed for PEFT, using bilevel optimization to embed persistent backdoors into PLMs.
Findings
PETA achieves high attack success rates across various tasks.
The backdoor remains effective even without full knowledge of the training process.
PETA maintains high clean accuracy while embedding the backdoor.
Abstract
Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT achieves performance that is comparable to standard fine-tuning. However, despite its prevalent use, the security implications of PEFT remain largely unexplored. In this paper, we take the initial steps and present PETA, a novel trojan attack that compromises the weights of PLMs by accounting for downstream adaptation through bilevel optimization: the upper-level objective embeds the backdoor into a model while the lower-level objective simulates PEFT to both retain the PLM's task-specific performance and ensure that the backdoor persists after fine-tuning. With extensive evaluation across a variety of downstream tasks and trigger designs, we demonstrate PETA's effectiveness in terms of both attack success…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Neuroscience and Neural Engineering · Low-power high-performance VLSI design
