TL;DR
iSeal introduces a robust fingerprinting method for LLMs that remains effective even when attackers control the inference process, ensuring reliable ownership verification against sophisticated attacks.
Contribution
It is the first fingerprinting approach resilient to verification-time attacks, combining model and external features with error correction and similarity verification.
Findings
Achieves 100% FSR on 12 LLMs against 10+ attacks.
Outperforms baselines under unlearning and response manipulation.
Resistant to collusion-based fingerprint unlearning.
Abstract
Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownership verification, LLM fingerprinting thus plays a vital role in addressing this challenge. Existing LLM fingerprinting methods verify ownership by extracting or injecting model-specific features. However, they overlook potential attacks during the verification process, leaving them ineffective when the model thief fully controls the LLM's inference process. In such settings, attackers may share prompt-response pairs to enable fingerprint unlearning or manipulate outputs to evade exact-match verification. We propose iSeal, the first fingerprinting method designed for reliable verification when the model thief controls the suspected LLM in an end-to-end manner. It injects unique features into both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
