Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via   Contrastive Learning

Xiaoyi Chen; Baisong Xin; Shengfang Zhai; Shiqing Ma; Qingni Shen and; Zhonghai Wu

arXiv:2210.11082·cs.CL·October 21, 2022

Apple of Sodom: Hidden Backdoors in Superior Sentence Embeddings via Contrastive Learning

Xiaoyi Chen, Baisong Xin, Shengfang Zhai, Shiqing Ma, Qingni Shen and, Zhonghai Wu

PDF

Open Access

TL;DR

This paper reveals that contrastive learning enhances sentence embeddings but introduces vulnerabilities to backdoor attacks, proposing BadCSE as a novel framework to inject and evaluate backdoors in such models.

Contribution

It introduces the first backdoor attack framework, BadCSE, demonstrating vulnerabilities in state-of-the-art sentence embeddings under various learning settings.

Findings

01

Backdoor attacks significantly degrade embedding quality in supervised settings.

02

BadCSE achieves a 97.70% success rate in targeted backdoor attacks.

03

Backdoored models retain utility while being susceptible to malicious manipulation.

Abstract

This paper finds that contrastive learning can produce superior sentence embeddings for pre-trained models but is also vulnerable to backdoor attacks. We present the first backdoor attack framework, BadCSE, for state-of-the-art sentence embeddings under supervised and unsupervised learning settings. The attack manipulates the construction of positive and negative pairs so that the backdoored samples have a similar embedding with the target sample (targeted attack) or the negative embedding of its clean version (non-targeted attack). By injecting the backdoor in sentence embeddings, BadCSE is resistant against downstream fine-tuning. We evaluate BadCSE on both STS tasks and other downstream tasks. The supervised non-targeted attack obtains a performance degradation of 194.86%, and the targeted attack maps the backdoored samples to the target embedding with a 97.70% success rate while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Interpreting and Communication in Healthcare · Natural Language Processing Techniques

MethodsContrastive Learning