Unlearning at Scale: Implementing the Right to be Forgotten in Large Language Models

Abdullah X

arXiv:2508.12220·cs.LG·August 19, 2025

Unlearning at Scale: Implementing the Right to be Forgotten in Large Language Models

Abdullah X

PDF

Open Access

TL;DR

This paper presents a scalable, reproducible system for unlearning data from large language models to comply with GDPR, ensuring exact parameter reversion while balancing latency and availability constraints.

Contribution

It introduces a deterministic, log-based approach for efficient unlearning in large models, combining exact reverts, adapter deletion, and anti-updates for compliance.

Findings

01

Achieved byte-identical model states after unlearning in controlled experiments

02

Developed a minimal logging system for deterministic training replay

03

Demonstrated practical unlearning methods balancing latency and accuracy

Abstract

We study the right to be forgotten (GDPR Art. 17) for large language models and frame unlearning as a reproducible systems problem. Our approach treats training as a deterministic program and logs a minimal per-microbatch record (ordered ID hash, RNG seed, learning-rate value, optimizer-step counter, and accumulation boundary). Under a pinned stack and deterministic kernels, replaying the training tail while filtering only the forget closure yields the same parameters as training on the retain set (bit-identical in the training dtype) when preconditions hold. To meet latency and availability constraints, we add complementary paths: (i) exact reverts of recent steps via micro-checkpoints or dense per-step deltas, (ii) cohort-scoped adapter deletion when the base is frozen, and (iii) a curvature-guided anti-update followed by a short retain-tune, audit-gated with escalation to exact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law