Independence Tests for Language Models
Sally Zhu, Ahmed Ahmed, Rohith Kuditipudi, Percy Liang

TL;DR
This paper develops statistical tests to determine whether two language models were trained independently, with methods applicable under assumptions and in more flexible, adversarial settings, enabling detection of shared components and dependencies.
Contribution
It introduces a family of exact p-value tests for the constrained setting and a robust activation-matching test for the unconstrained setting, advancing methods to assess model independence.
Findings
Successfully identified all non-independent model pairs in experiments.
Effective even when models were fine-tuned or retrained in parts.
Able to detect shared components and model dependencies.
Abstract
We consider the following problem: given the weights of two models, can we test whether they were trained independently -- i.e., from independent random initializations? We consider two settings: constrained and unconstrained. In the constrained setting, we make assumptions about model architecture and training and propose a family of statistical tests that yield exact p-values with respect to the null hypothesis that the models are trained from independent random initializations. These p-values are valid regardless of the composition of either model's training data; we compute them by simulating exchangeable copies of each model under our assumptions and comparing various similarity measures of weights and activations between the original two models versus these copies. We report the p-values from these tests on pairs of 21 open-weight models (210 total pairs) and correctly identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsLLaMA
