UOR: Universal Backdoor Attacks on Pre-trained Language Models

Wei Du; Peixuan Li; Boqun Li; Haodong Zhao; Gongshen Liu

arXiv:2305.09574·cs.CL·December 20, 2024·1 cites

UOR: Universal Backdoor Attacks on Pre-trained Language Models

Wei Du, Peixuan Li, Boqun Li, Haodong Zhao, Gongshen Liu

PDF

Open Access 1 Video

TL;DR

This paper introduces UOR, a novel, automatic backdoor attack method on pre-trained language models that enhances attack effectiveness and universality across various tasks and architectures.

Contribution

UOR automates trigger selection and output representation learning, enabling more effective, task-agnostic backdoor attacks on PLMs compared to manual approaches.

Findings

01

UOR outperforms manual methods in attack success rate.

02

The method demonstrates universality across different PLM architectures.

03

Effective on various text classification tasks.

Abstract

Backdoors implanted in pre-trained language models (PLMs) can be transferred to various downstream tasks, which exposes a severe security threat. However, most existing backdoor attacks against PLMs are un-targeted and task-specific. Few targeted and task-agnostic methods use manually pre-defined triggers and output representations, which prevent the attacks from being more effective and general. In this paper, we first summarize the requirements that a more threatening backdoor attack against PLMs should satisfy, and then propose a new backdoor attack method called UOR, which breaks the bottleneck of the previous approach by turning manual selection into automatic optimization. Specifically, we define poisoned supervised contrastive learning which can automatically learn the more uniform and universal output representations of triggers for various PLMs. Moreover, we use gradient search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

UOR: Universal Backdoor Attacks on Pre-trained Language Models· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsContrastive Learning