Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework
Hongyi Tang, Zhihao Zhu, Yi Yang

TL;DR
This paper presents NA-PDD, a neuron activation-based framework for detecting whether specific data was part of an LLM's training set, addressing ethical and legal concerns with improved accuracy over existing methods.
Contribution
The paper introduces NA-PDD, a novel neuron activation analysis algorithm, and CCNewsPDD, a rigorous benchmark for pre-training data detection in LLMs.
Findings
NA-PDD outperforms existing detection methods across multiple benchmarks.
Neuron activation patterns differ significantly between training and non-training data.
The new benchmark ensures consistent temporal data distribution for fair evaluation.
Abstract
The performance of large language models (LLMs) is closely tied to their training data, which can include copyrighted material or private information, raising legal and ethical concerns. Additionally, LLMs face criticism for dataset contamination and internalizing biases. To address these issues, the Pre-Training Data Detection (PDD) task was proposed to identify if specific data was included in an LLM's pre-training corpus. However, existing PDD methods often rely on superficial features like prediction confidence and loss, resulting in mediocre performance. To improve this, we introduce NA-PDD, a novel algorithm analyzing differential neuron activation patterns between training and non-training data in LLMs. This is based on the observation that these data types activate different neurons during LLM inference. We also introduce CCNewsPDD, a temporally unbiased benchmark employing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications
