Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining
Aarash Abro, Muhammad Tahir

TL;DR
This paper introduces DG-Hard, a spectral repair method that post-hoc recovers damaged capabilities in language models caused by fine-tuning without retraining, by filtering the weight updates using spectral techniques.
Contribution
The paper presents DG-Hard, a novel spectral filtering approach for post-hoc repair of fine-tuned models, effectively restoring capabilities while preserving beneficial improvements.
Findings
DG-Hard achieves the strongest balanced repair across multiple benchmarks.
It restores safety alignment degraded by benign fine-tuning.
Spectral residue in weight updates is a key factor in capability loss.
Abstract
Fine-tuning a language model for a target task routinely degrades capabilities the training data never explicitly threatened. We study this phenomenon, known as catastrophic forgetting, and propose a post-hoc repair solution that uses only the pretrained checkpoint and its fine-tuned descendant . The goal is not merely to revert the model toward the base checkpoint, but to recover capabilities damaged by fine-tuning while preserving both the target-task gains and any beneficial held-out improvements. We introduce DG-Hard, a checkpoint-only spectral repair method for the fine-tuning update . DG-Hard treats as a low-rank task-aligned signal embedded in an IID-like noise residual that gradient descent has no incentive to remove, and applies the Donoho-Gavish hard singular-value threshold to each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
