LoKI: Low-damage Knowledge Implanting of Large Language Models
Runyu Wang, Peng Ping, Zhengyu Guo, Xiaoye Zhang, Quan Shi, Liting Zhou, Tianbo Ji

TL;DR
LoKI is a novel fine-tuning method for large language models that minimizes catastrophic forgetting while maintaining high task performance, by leveraging mechanistic insights into knowledge storage in transformers.
Contribution
It introduces Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning technique that better preserves general capabilities during task adaptation.
Findings
LoKI outperforms existing PEFT methods in preserving general knowledge.
LoKI achieves comparable or superior task-specific performance.
LoKI demonstrates effectiveness across various model architectures.
Abstract
Fine-tuning adapts pretrained models for specific tasks but poses the risk of catastrophic forgetting (CF), where critical knowledge from pretraining is overwritten. To address the issue of CF in a general-purpose framework, we propose Low-damage Knowledge Implanting (LoKI), a parameter-efficient fine-tuning (PEFT) technique that utilizes recent mechanistic understanding of how knowledge is stored in transformer architectures. We compare LoKI against state-of-the-art PEFT methods in two real-world fine-tuning scenarios. The results show that LoKI demonstrates significantly better preservation of general capabilities. At the same time, its task-specific performance is comparable to or even surpasses that of full parameter fine-tuning and these PEFT methods across various model architectures. Our work bridges the mechanistic insights of LLMs' knowledge storage with practical fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
