Approximating Language Model Training Data from Weights

John X. Morris; Junjie Oscar Yin; Woojeong Kim; Vitaly Shmatikov; Alexander M. Rush

arXiv:2506.15553·cs.CL·June 19, 2025

Approximating Language Model Training Data from Weights

John X. Morris, Junjie Oscar Yin, Woojeong Kim, Vitaly Shmatikov, Alexander M. Rush

PDF

Open Access

TL;DR

This paper introduces a gradient-based method to approximate training data from language model weights, enabling data recovery and model performance enhancement without access to original training data.

Contribution

It formalizes the data approximation problem, proposes effective baselines, and demonstrates the ability to recover useful data and improve model performance using only model weights.

Findings

01

Successfully recovers relevant training data from model weights.

02

Improves classification accuracy from 65% to 80% on AG News.

03

Reduces perplexity from 3.3 to 2.3 on MSMARCO data.

Abstract

Modern language models often have open weights but closed training data. We formalize the problem of data approximation from model weights and propose several baselines and metrics. We develop a gradient-based approach that selects the highest-matching data from a large public text corpus and show its effectiveness at recovering useful data given only weights of the original and finetuned models. Even when none of the true training data is known, our method is able to locate a small subset of public Web documents can be used to train a model to close to the original model performance given models trained for both classification and supervised-finetuning. On the AG News classification task, our method improves performance from 65% (using randomly selected data) to 80%, approaching the expert benchmark of 88%. When applied to a model trained with SFT on MSMARCO web documents, our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling