SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

Giyeong Oh; Saejin Kim; Woohyun Cho; Sangkyu Lee; Jiwan Chung; Dokyung; Song; Youngjae Yu

arXiv:2501.09284·cs.AI·January 20, 2025

SEAL: Entangled White-box Watermarks on Low-Rank Adaptation

Giyeong Oh, Saejin Kim, Woohyun Cho, Sangkyu Lee, Jiwan Chung, Dokyung, Song, Youngjae Yu

PDF

Open Access 3 Reviews

TL;DR

SEAL introduces a universal white-box watermarking method for LoRA weights, embedding a secret passport that enables copyright protection without degrading model performance and resisting various attacks.

Contribution

The paper proposes SEAL, a novel watermarking technique for LoRA weights that entangles a secret passport with the weights during training, ensuring ownership verification.

Findings

01

No performance degradation across multiple tasks.

02

Robust against removal, obfuscation, and ambiguity attacks.

03

Effective for copyright protection of LoRA models.

Abstract

Recently, LoRA and its variants have become the de facto strategy for training and sharing task-specific versions of large pretrained models, thanks to their efficiency and simplicity. However, the issue of copyright protection for LoRA weights, especially through watermark-based techniques, remains underexplored. To address this gap, we propose SEAL (SEcure wAtermarking on LoRA weights), the universal whitebox watermarking for LoRA. SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles the passport with the LoRA weights through training, without extra loss for entanglement, and distributes the finetuned weights after hiding the passport. When applying SEAL, we observed no performance degradation across commonsense reasoning, textual/visual instruction tuning, and text-to-image synthesis tasks. We…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. **Clear and simple idea.** The proposed mechanism is straightforward: alternate two fixed matrices during training and fold one of them into the released adapter. The approach is easy to implement, requires no architectural changes, and adds no inference overhead. 2. **Clearly specified verification rule (but missing one policy detail).** The paper precisely describes the public verification procedure through two checks (R1 and R2), and for accuracy-based tasks derives the threshold \(\

Weaknesses

While the paper is well-written and the results are convincing, several aspects could be clarified or strengthened: 1. **Public rule can accept trivial claims unless policy forbids them.** Someone could submit \((B,A)=(B',A')\) and set both passports to the identity matrix. That would pass **R1** (exact reconstruction) and **R2** (zero gap) automatically. If I am not mistaken, the paper does not explicitly say the verifier should reject such submissions or require prior provenance/commitme

Reviewer 02Rating 4Confidence 2

Strengths

1. This paper designs a white-box watermarking mechanism specifically for the LoRA structure to facilitate the ownership protection of relevant weights. 2. The method is concise and can maintain model performance across multiple experimental scenarios.

Weaknesses

1. This work does not explore the impact of matrix properties such as distribution and sparsity on watermark robustness and model performance, nor does it systematically compare the effectiveness differences between different types of passports. The design basis remains insufficient. 2. The time overhead introduced by this method is still significant (Table 8), and it does not quantify the memory consumption of SEAL during training or the additional overhead during inference compared to stan

Reviewer 03Rating 2Confidence 3

Strengths

- The paper's primary strength is its significant problem formulation. While DNN watermarking is well-studied, most methods target the entire base model or rely on black-box outputs. This work correctly identifies that for the PEFT ecosystem, the adapter itself is the distributable IP. Defining a white-box, adapter-level ownership verification protocol is a practical contribution. - The paper is exceptionally well-written. The method, threat model, and verification protocol are all defined form

Weaknesses

- The core defense against ambiguity attacks is the dual-passport fidelity gap, for which the paper proposes a formal statistical guarantee using Hoeffding's inequality. However, in the paper's own experiments, this formal guarantee fails for the Mistral-7B model, where the owner's observed gap far exceeds the theoretical threshold. - The limitations section admits that "An adversary who re-trains on similar data may reproduce the owner's dual entanglement and pass verification by design". The t

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques