SLIM: Stealthy Low-Coverage Black-Box Watermarking via Latent-Space Confusion Zones
Hengyu Wu, Yang Cao

TL;DR
SLIM introduces a novel black-box watermarking method for LLMs that uses latent-space confusion zones to verify data provenance with minimal data modification, ensuring stealthiness and robustness.
Contribution
It proposes a new low-coverage, stealthy watermarking framework leveraging intrinsic LLM properties for reliable provenance verification.
Findings
Achieves ultra-low coverage data watermarking.
Demonstrates strong black-box verification performance.
Maintains model utility and stealthiness.
Abstract
Training data is a critical and often proprietary asset in Large Language Model (LLM) development, motivating the use of data watermarking to embed model-transferable signals for usage verification. We identify low coverage as a vital yet largely overlooked requirement for practicality, as individual data owners typically contribute only a minute fraction of massive training corpora. Prior methods fail to maintain stealthiness, verification feasibility, or robustness when only one or a few sequences can be modified. To address these limitations, we introduce SLIM, a framework enabling per-user data provenance verification under strict black-box access. SLIM leverages intrinsic LLM properties to induce a Latent-Space Confusion Zone by training the model to map semantically similar prefixes to divergent continuations. This manifests as localized generation instability, which can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
