LLM Fingerprinting via Semantically Conditioned Watermarks
Thibaud Gloaguen, Robin Staab, Nikola Jovanovi\'c, Martin Vechev

TL;DR
This paper introduces a novel LLM fingerprinting method using semantically conditioned watermarks that are robust, stealthy, and survive common deployment transformations, enabling reliable ownership verification.
Contribution
The paper proposes a new watermarking approach that embeds signals throughout responses conditioned on semantic prompts, improving robustness and stealth over fixed-key methods.
Findings
Watermarks remain detectable after finetuning and quantization.
The method is effective across multiple deployment scenarios.
Watermarks are indistinguishable from normal responses.
Abstract
Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce LLM fingerprinting via semantically conditioned watermarks, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint…
Peer Reviews
Decision·ICLR 2026 Oral
S1. I think the application aside, the general result is that semantic domains can be used as a backdoor trigger, and it is robust across many settings, which is an interesting result. S2. Experiments are thorough, containing analysis of many of the main points of their fingerprinting method, including many common deployment variations, such as the quantized or pruned model.
W1. I think that the setting is a bit questionable. At least in the US, whether LLMs are protected under copyright is uncertain (as some people argue model weights are just facts, and facts cannot be copyrighted), and enforcing a license attached to them is also very uncertain. Minor point though, I would just rewrite the motivation of the paper.
1. The paper is well-written and structured. The threat model where previous fingerprints remain less effective is well-defined, makes sense, and the motivation is clear. 2. The paper addresses a critical and timely problem, which is protecting the copyright of the open-weight LLM. The idea of applying the LLM watermark for fingerprint is simple, straightforward, yet effective. 3. The experimental results are strong. The proposed fingerprint method achieves a high success rate, and shows signif
1. **The requirement for high-entropy semantic domains and the risk of watermark leakage constrain the practical applicability and stealth of the method.** While the paper demonstrates successful fingerprinting in domains like French, Math, and Medicine, Figure 6 crucially shows that the detectability of the watermark is highly dependent on the entropy of the underlying domain. This creates a significant constraint that model providers cannot arbitrarily choose any semantic domain for fingerprin
- The paper's core idea is a significant conceptual leap over existing "query-key" fingerprinting. The direct injection of watermark output pattern into the model is clever. With the power of watermarks (e.g. robustness to post-editing), the detection becomes both robust and stealthy. - The experiments are both solid and comprehensive. - The presentation is good and easy to understand.
The paper seems good to me.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Adversarial Robustness in Machine Learning · Advanced Steganography and Watermarking Techniques
