Temper-Then-Tilt: Principled Unlearning for Generative Models through Tempering and Classifier Guidance
Jacob L. Block, Mehryar Mohri, Aryan Mokhtari, Sanjay Shakkottai

TL;DR
This paper introduces T3-Unlearning, a novel method for unlearning in large generative models that combines tempering and classifier guidance, providing theoretical guarantees and improved empirical performance on the TOFU benchmark.
Contribution
The paper proposes T3-Unlearning, a two-step inference approach that enhances unlearning fidelity and efficiency, with theoretical analysis and empirical validation.
Findings
T3-Unlearning outperforms existing methods on the TOFU benchmark.
Tempering is essential for unlearning concentrated distributions.
Theoretical guarantees link classifier risk to unlearning error.
Abstract
We study machine unlearning in large generative models by framing the task as density ratio estimation to a target distribution rather than supervised fine-tuning. While classifier guidance is a standard approach for approximating this ratio and can succeed in general, we show it can fail to faithfully unlearn with finite samples when the forget set represents a sharp, concentrated data distribution. To address this, we introduce Temper-Then-Tilt Unlearning (T3-Unlearning), which freezes the base model and applies a two-step inference procedure: (i) tempering the base distribution to flatten high-confidence spikes, and (ii) tilting the tempered distribution using a lightweight classifier trained to distinguish retain from forget samples. Our theoretical analysis provides finite-sample guarantees linking the surrogate classifier's risk to unlearning error, proving that tempering is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Evolutionary Algorithms and Applications · Machine Learning and Data Classification
