Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework

Hongyi Tang; Zhihao Zhu; Yi Yang

arXiv:2507.16414·cs.AI·July 23, 2025

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework

Hongyi Tang, Zhihao Zhu, Yi Yang

PDF

Open Access 1 Video

TL;DR

This paper presents NA-PDD, a neuron activation-based framework for detecting whether specific data was part of an LLM's training set, addressing ethical and legal concerns with improved accuracy over existing methods.

Contribution

The paper introduces NA-PDD, a novel neuron activation analysis algorithm, and CCNewsPDD, a rigorous benchmark for pre-training data detection in LLMs.

Findings

01

NA-PDD outperforms existing detection methods across multiple benchmarks.

02

Neuron activation patterns differ significantly between training and non-training data.

03

The new benchmark ensures consistent temporal data distribution for fair evaluation.

Abstract

The performance of large language models (LLMs) is closely tied to their training data, which can include copyrighted material or private information, raising legal and ethical concerns. Additionally, LLMs face criticism for dataset contamination and internalizing biases. To address these issues, the Pre-Training Data Detection (PDD) task was proposed to identify if specific data was included in an LLM's pre-training corpus. However, existing PDD methods often rely on superficial features like prediction confidence and loss, resulting in mediocre performance. To improve this, we introduce NA-PDD, a novel algorithm analyzing differential neuron activation patterns between training and non-training data in LLMs. This is based on the observation that these data types activate different neurons during LLM inference. We also introduce CCNewsPDD, a temporally unbiased benchmark employing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Identifying Pre-training Data in LLMs: A Neuron Activation-Based Detection Framework· underline

Taxonomy

TopicsNeural Networks and Applications