Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
Duc N.M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang, Wang

TL;DR
This paper investigates how compression affects LLMs' knowledge, proposing that knowledge is displaced rather than forgotten, and introduces an inference-time dynamic prompting method that effectively restores performance with efficiency gains.
Contribution
The study introduces the IDP method for knowledge recovery in compressed LLMs, demonstrating its advantages over re-training and providing insights into knowledge displacement.
Findings
Prompting with IDP outperforms re-training methods like LoRA.
Knowledge is displaced, not forgotten, after compression.
IDP reduces inference latency by 60% and parameter size by 21x.
Abstract
Compressing Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. In this work, we dive into how compression damages LLMs' inherent knowledge and the possible remedies. We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed model to (re)learn from data with additional parameters; the other presumes that knowledge is internally displaced and hence one requires merely "inference re-direction" with input-side augmentation such as prompting, to recover the knowledge-related performance. Extensive experiments are then designed to (in)validate the two conjectures. We observe the promise of prompting in comparison to model tuning; we further unlock prompting's potential by introducing a variant called Inference-time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
