TL;DR
This paper introduces MSA, a novel unlearning algorithm for large language models that leverages model state history to efficiently remove the influence of specific data points, improving data privacy and model flexibility.
Contribution
MSA is a new unlearning method that uses prior model checkpoints to effectively erase targeted data influence without full retraining.
Findings
MSA outperforms existing unlearning algorithms on multiple benchmarks.
MSA achieves competitive or better results in data erasure tasks.
MSA demonstrates effectiveness across various models and evaluation metrics.
Abstract
Large language models are trained on massive corpora of web data, which may include private data, copyrighted material, factually inaccurate data, or data that degrades model performance. Eliminating the influence of such problematic datapoints on a model through complete retraining -- by repeatedly pretraining the model on datasets that exclude these specific instances -- is computationally prohibitive. To address this, unlearning algorithms have been proposed, that aim to eliminate the influence of particular datapoints at a low computational cost, while leaving the rest of the model intact. However, precisely unlearning the influence of data on a large language model has proven to be a major challenge. In this work, we propose a new algorithm, MSA (Model State Arithmetic), for unlearning datapoints in large language models. MSA utilizes prior model checkpoints -- artifacts that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
