Are You Copying My Model? Protecting the Copyright of Large Language   Models for EaaS via Backdoor Watermark

Wenjun Peng; Jingwei Yi; Fangzhao Wu; Shangxi Wu; Bin Zhu; Lingjuan; Lyu; Binxing Jiao; Tong Xu; Guangzhong Sun; Xing Xie

arXiv:2305.10036·cs.CL·June 5, 2023·2 cites

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

Wenjun Peng, Jingwei Yi, Fangzhao Wu, Shangxi Wu, Bin Zhu, Lingjuan, Lyu, Binxing Jiao, Tong Xu, Guangzhong Sun, Xing Xie

PDF

Open Access 1 Repo

TL;DR

This paper introduces EmbMarker, a backdoor watermarking technique for large language models in EaaS, enabling copyright protection through embedding watermarks that are resistant to model extraction attacks while maintaining model utility.

Contribution

The paper proposes EmbMarker, a novel backdoor watermarking method that embeds copyright marks into LLM embeddings for EaaS, balancing robustness and utility.

Findings

01

Effective watermark transfer to stolen models

02

Minimal impact on model performance

03

Robust against extraction attacks

Abstract

Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. However, previous studies have shown that EaaS is vulnerable to model extraction attacks, which can cause significant losses for the owners of LLMs, as training these models is extremely expensive. To protect the copyright of LLMs for EaaS, we propose an Embedding Watermark method called EmbMarker that implants backdoors on embeddings. Our method selects a group of moderate-frequency words from a general text corpus to form a trigger set, then selects a target embedding as the watermark, and inserts it into the embeddings of texts containing trigger words as the backdoor. The weight of insertion is proportional to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yjw1029/embmarker
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Natural Language Processing Techniques

Methodstravel james