Unlearning at Scale: Implementing the Right to be Forgotten in Large Language Models
Abdullah X

TL;DR
This paper presents a scalable, reproducible system for unlearning data from large language models to comply with GDPR, ensuring exact parameter reversion while balancing latency and availability constraints.
Contribution
It introduces a deterministic, log-based approach for efficient unlearning in large models, combining exact reverts, adapter deletion, and anti-updates for compliance.
Findings
Achieved byte-identical model states after unlearning in controlled experiments
Developed a minimal logging system for deterministic training replay
Demonstrated practical unlearning methods balancing latency and accuracy
Abstract
We study the right to be forgotten (GDPR Art. 17) for large language models and frame unlearning as a reproducible systems problem. Our approach treats training as a deterministic program and logs a minimal per-microbatch record (ordered ID hash, RNG seed, learning-rate value, optimizer-step counter, and accumulation boundary). Under a pinned stack and deterministic kernels, replaying the training tail while filtering only the forget closure yields the same parameters as training on the retain set (bit-identical in the training dtype) when preconditions hold. To meet latency and availability constraints, we add complementary paths: (i) exact reverts of recent steps via micro-checkpoints or dense per-step deltas, (ii) cohort-scoped adapter deletion when the base is frozen, and (iii) a curvature-guided anti-update followed by a short retain-tune, audit-gated with escalation to exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
