As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files
Haodong Li, Jingqi Zhang, Xiao Cheng, Peihua Mai, Haoyu Wang, Yan Pang

TL;DR
COPYCHECK is a novel framework that uses uncertainty signals from LLMs to accurately detect whether specific content was part of their training data, addressing limitations of previous methods.
Contribution
It introduces a new approach leveraging LLM overconfidence and uncertainty patterns for copyright detection, with strategies to improve robustness and threshold independence.
Findings
Achieves over 90% balanced accuracy on LLaMA 7b and LLaMA2 7b
Outperforms state-of-the-art by over 90% relative improvement
Generalizes well across different LLM architectures
Abstract
The remarkable language ability of Large Language Models (LLMs) stems from extensive training on vast datasets, often including copyrighted material, which raises serious concerns about unauthorized use. While Membership Inference Attacks (MIAs) offer potential solutions for detecting such violations, existing approaches face critical limitations and challenges due to LLMs' inherent overconfidence, limited access to ground truth training data, and reliance on empirically determined thresholds. We present COPYCHECK, a novel framework that leverages uncertainty signals to detect whether copyrighted content was used in LLM training sets. Our method turns LLM overconfidence from a limitation into an asset by capturing uncertainty patterns that reliably distinguish between ``seen" (training data) and ``unseen" (non-training data) content. COPYCHECK further implements a two-fold strategy:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
