Blackbox Model Provenance via Palimpsestic Membership Inference

Rohith Kuditipudi; Jing Huang; Sally Zhu; Diyi Yang; Christopher Potts; Percy Liang

arXiv:2510.19796·cs.LG·October 23, 2025

Blackbox Model Provenance via Palimpsestic Membership Inference

Rohith Kuditipudi, Jing Huang, Sally Zhu, Diyi Yang, Christopher Potts, Percy Liang

PDF

Open Access 1 Video

TL;DR

This paper introduces a statistical method to determine whether a blackbox language model or generated text originates from a specific training run, leveraging palimpsestic memorization and correlation testing.

Contribution

It formulates the provenance verification as an independence test and demonstrates effective detection of model usage through correlation analysis of training data order.

Findings

01

High statistical significance in query setting with p-value ~1e-8

02

Reliable detection of Bob's text with as little as a few hundred tokens

03

Effective distinction between models trained on original vs. reshuffled data

Abstract

Suppose Alice trains an open-weight language model and Bob uses a blackbox derivative of Alice's model to produce text. Can Alice prove that Bob is using her model, either by querying Bob's derivative model (query setting) or from the text alone (observational setting)? We formulate this question as an independence testing problem--in which the null hypothesis is that Bob's model or text is independent of Alice's randomized training run--and investigate it through the lens of palimpsestic memorization in language models: models are more likely to memorize data seen later in training, so we can test whether Bob is using Alice's model using test statistics that capture correlation between Bob's model or text and the ordering of training examples in Alice's training run. If Alice has randomly shuffled her training data, then any significant correlation amounts to exactly quantifiable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Blackbox Model Provenance via Palimpsestic Membership Inference· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Scientific Computing and Data Management