SEAL: Entangled White-box Watermarks on Low-Rank Adaptation
Giyeong Oh, Saejin Kim, Woohyun Cho, Sangkyu Lee, Jiwan Chung, Dokyung, Song, Youngjae Yu

TL;DR
SEAL introduces a universal white-box watermarking method for LoRA weights, embedding a secret passport that enables copyright protection without degrading model performance and resisting various attacks.
Contribution
The paper proposes SEAL, a novel watermarking technique for LoRA weights that entangles a secret passport with the weights during training, ensuring ownership verification.
Findings
No performance degradation across multiple tasks.
Robust against removal, obfuscation, and ambiguity attacks.
Effective for copyright protection of LoRA models.
Abstract
Recently, LoRA and its variants have become the de facto strategy for training and sharing task-specific versions of large pretrained models, thanks to their efficiency and simplicity. However, the issue of copyright protection for LoRA weights, especially through watermark-based techniques, remains underexplored. To address this gap, we propose SEAL (SEcure wAtermarking on LoRA weights), the universal whitebox watermarking for LoRA. SEAL embeds a secret, non-trainable matrix between trainable LoRA weights, serving as a passport to claim ownership. SEAL then entangles the passport with the LoRA weights through training, without extra loss for entanglement, and distributes the finetuned weights after hiding the passport. When applying SEAL, we observed no performance degradation across commonsense reasoning, textual/visual instruction tuning, and text-to-image synthesis tasks. We…
Peer Reviews
Decision·Submitted to ICLR 2026
1. **Clear and simple idea.** The proposed mechanism is straightforward: alternate two fixed matrices during training and fold one of them into the released adapter. The approach is easy to implement, requires no architectural changes, and adds no inference overhead. 2. **Clearly specified verification rule (but missing one policy detail).** The paper precisely describes the public verification procedure through two checks (R1 and R2), and for accuracy-based tasks derives the threshold \(\
While the paper is well-written and the results are convincing, several aspects could be clarified or strengthened: 1. **Public rule can accept trivial claims unless policy forbids them.** Someone could submit \((B,A)=(B',A')\) and set both passports to the identity matrix. That would pass **R1** (exact reconstruction) and **R2** (zero gap) automatically. If I am not mistaken, the paper does not explicitly say the verifier should reject such submissions or require prior provenance/commitme
1. This paper designs a white-box watermarking mechanism specifically for the LoRA structure to facilitate the ownership protection of relevant weights. 2. The method is concise and can maintain model performance across multiple experimental scenarios.
1. This work does not explore the impact of matrix properties such as distribution and sparsity on watermark robustness and model performance, nor does it systematically compare the effectiveness differences between different types of passports. The design basis remains insufficient. 2. The time overhead introduced by this method is still significant (Table 8), and it does not quantify the memory consumption of SEAL during training or the additional overhead during inference compared to stan
- The paper's primary strength is its significant problem formulation. While DNN watermarking is well-studied, most methods target the entire base model or rely on black-box outputs. This work correctly identifies that for the PEFT ecosystem, the adapter itself is the distributable IP. Defining a white-box, adapter-level ownership verification protocol is a practical contribution. - The paper is exceptionally well-written. The method, threat model, and verification protocol are all defined form
- The core defense against ambiguity attacks is the dual-passport fidelity gap, for which the paper proposes a formal statistical guarantee using Hoeffding's inequality. However, in the paper's own experiments, this formal guarantee fails for the Mistral-7B model, where the owner's observed gap far exceeds the theoretical threshold. - The limitations section admits that "An adversary who re-trains on similar data may reproduce the owner's dual entanglement and pass verification by design". The t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques
