Ideal Attribution and Faithful Watermarks for Language Models
Min Jae Song, Kameron Shahabi

TL;DR
This paper proposes a formal framework for ideal attribution and watermarking in language models, providing a clear foundation for designing and evaluating attribution mechanisms with guaranteed properties.
Contribution
It introduces a formal abstraction called the ledger for deterministic attribution decisions and frames watermarking as a faithful representation of these ideal mechanisms.
Findings
Provides a unified language for attribution guarantees
Enables precise reasoning about watermarking desiderata
Sets a roadmap for future watermarking scheme development
Abstract
We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of the prompt-response interaction history between a model and its user. Each mechanism produces deterministic decisions based on the ledger and an explicit selection criterion, making it well-suited to serve as a ground truth for attribution. We frame the design goal of watermarking schemes as faithful representation of ideal attribution mechanisms. This novel perspective brings conceptual clarity, replacing piecemeal probabilistic statements with a unified language for stating the guarantees of each scheme. It also enables precise reasoning about desiderata for future watermarking schemes, even when no current construction achieves them, since the ideal functionalities are specified first. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Steganography and Watermarking Techniques · Adversarial Robustness in Machine Learning · Advanced Malware Detection Techniques
