A More Practical Approach to Machine Unlearning
David Zagardo

TL;DR
This paper presents practical methods for machine unlearning, emphasizing first-epoch gradient ascent, layer-specific techniques, and influence scoring to improve privacy and model compliance.
Contribution
It introduces a first-epoch gradient ascent approach and layer-based unlearning strategies, especially focusing on the embedding layer in GPT-2, for more effective and efficient unlearning.
Findings
First-epoch gradient unlearning outperforms multi-epoch methods.
Embedding layer in GPT-2 is key for effective unlearning.
Fuzzy matching and iterative unlearning offer different advantages.
Abstract
Machine learning models often incorporate vast amounts of data, raising significant privacy concerns. Machine unlearning, the ability to remove the influence of specific data points from a trained model, addresses these concerns. This paper explores practical methods for implementing machine unlearning, focusing on a first-epoch gradient-ascent approach. Key findings include: 1. Single vs. Multi-Epoch Unlearning: First-epoch gradient unlearning is more effective than multi-epoch gradients. 2. Layer-Based Unlearning: The embedding layer in GPT-2 is crucial for effective unlearning. Gradients from the output layers (11 and 12) have no impact. Efficient unlearning can be achieved using only the embedding layer, halving space complexity. 3. Influence Functions & Scoring: Techniques like Hessian Vector Product and the dot product of activations and tensors are used for quantifying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExperimental Learning in Engineering
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Layer Normalization · Byte Pair Encoding · Adam · Attention Dropout · Weight Decay · Linear Warmup With Cosine Annealing · Linear Layer
