Hey, That's My Data! Token-Only Dataset Inference in Large Language Models
Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang

TL;DR
This paper introduces CatShift, a token-only dataset inference method that detects if a dataset was used in training large language models by analyzing output shifts caused by fine-tuning, without requiring internal model access.
Contribution
The paper presents a novel token-only inference framework based on catastrophic forgetting, enabling dataset membership detection without logit access.
Findings
CatShift effectively detects dataset inclusion in both open-source and API-based LLMs.
Fine-tuning on known data causes larger output shifts than on unseen data, enabling inference.
The method works without access to internal logits, making it practical for proprietary models.
Abstract
Large Language Models (LLMs) rely on massive training datasets, often including proprietary data, which raises concerns about unauthorized usage and copyright infringement. Existing dataset inference methods typically require access to log probabilities or other internal signals, but many modern LLMs restrict such access, motivating token-only inference approaches. We propose CatShift, a token-only dataset inference framework based on catastrophic forgetting, where models overwrite prior knowledge when trained on new data. Fine-tuning an LLM on a subset of its training data induces larger output shifts than fine-tuning on unseen data. CatShift compares these shifts against those from a known non-member validation set to infer whether a dataset was included in training. Experiments on both open-source and API-based LLMs show that CatShift remains effective without logit access, enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
