Watermarks for Embeddings-as-a-Service Large Language Models
Anudeex Shetty

TL;DR
This paper investigates the vulnerability of existing EaaS watermarks to paraphrasing attacks and introduces a novel watermarking method, WET, that is robust against such attacks for protecting LLM-based embedding services.
Contribution
The paper uncovers a new vulnerability in current EaaS watermarks and proposes WET, a linear transformation-based watermarking technique that resists paraphrasing attacks.
Findings
Existing watermarks can be bypassed by paraphrasing input text.
WET achieves near-perfect robustness against paraphrasing attacks.
Detailed ablation studies validate the effectiveness of WET.
Abstract
Large Language Models (LLMs) have demonstrated exceptional capabilities in natural language understanding and generation. Based on these LLMs, businesses have started to provide Embeddings-as-a-Service (EaaS), offering feature extraction capabilities (in the form of text embeddings) that benefit downstream natural language processing tasks. However, prior research has demonstrated that EaaS is vulnerable to imitation attacks, where an attacker clones the service's model in a black-box manner without access to the model's internal workings. In response, watermarks have been added to the text embeddings to protect the intellectual property of EaaS providers by allowing them to check for model ownership. This thesis focuses on defending against imitation attacks by investigating EaaS watermarks. To achieve this goal, we unveil novel attacks and propose and validate new watermarking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Topic Modeling
