Surgical Knowledge Rewrite in Compact LLMs: An 'Unlearn-then-Learn' Strategy with ($IA^3$) for Localized Factual Modulation and Catastrophic Forgetting Mitigation
Stanley Ngugi

TL;DR
This paper presents a novel 'unlearn-then-learn' strategy using $IA^3$ for precise knowledge editing in compact LLMs, significantly reducing catastrophic forgetting and improving localization of factual updates.
Contribution
It introduces a two-stage, mechanistically informed approach combining circuit localization with PEFT to achieve accurate fact updates and mitigate forgetting in LLMs.
Findings
Achieves 98.50% accuracy in new fact modulation.
Suppresses original conflicting fact with 96.00% forget rate.
Dramatically improves localization accuracy to 72.00%.
Abstract
Large Language Models (LLMs) struggle with dynamic knowledge updates, especially when new information conflicts with deeply embedded facts. Such conflicting factual edits often lead to two critical issues: resistance to adopting the new fact and severe catastrophic forgetting of unrelated knowledge. This paper introduces and evaluates a novel "unlearn-then-learn" strategy for precise knowledge editing in LLMs, leveraging the parameter-efficient fine-tuning (PEFT) technique, Infused Adapter by Inhibiting and Amplifying Inner Activations (). Crucially, this two-stage approach is powered by an initial circuit localization phase that identifies and targets the specific internal components responsible for encoding the conflicting fact. Through a rigorous experimental methodology on microsoft/Phi-3-mini-4k-instruct, we demonstrate that this mechanistically informed two-stage approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications · Topic Modeling
