A Statistical Hypothesis Testing Framework for Data Misappropriation Detection in Large Language Models
Yinpeng Cai, Lexin Li, Linjun Zhang

TL;DR
This paper introduces a statistical hypothesis testing framework that uses embedded watermarks to detect data misappropriation in large language models, addressing privacy and copyright concerns.
Contribution
It proposes embedding watermarks in training data and formulates misappropriation detection as a hypothesis testing problem, with proven optimality and empirical validation.
Findings
Effective detection of data misappropriation in LLMs
The proposed tests control error rates explicitly
Empirical results demonstrate high detection accuracy
Abstract
Large Language Models (LLMs) are rapidly gaining enormous popularity in recent years. However, the training of LLMs has raised significant privacy and legal concerns, particularly regarding the distillation and inclusion of copyrighted materials in their training data without proper attribution or licensing, an issue that falls under the broader concern of data misappropriation. In this article, we focus on a specific problem of data misappropriation detection, namely, to determine whether a given LLM has incorporated the data generated by another LLM. We propose embedding watermarks into the copyrighted training data and formulating the detection of data misappropriation as a hypothesis testing problem. We develop a general statistical testing framework, construct test statistics, determine optimal rejection thresholds, and explicitly control type I and type II errors. Furthermore, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Topic Modeling
MethodsFocus
