Do Compressed LLMs Forget Knowledge? An Experimental Study with   Practical Implications

Duc N.M Hoang; Minsik Cho; Thomas Merth; Mohammad Rastegari; Zhangyang; Wang

arXiv:2310.00867·cs.CL·February 19, 2024

Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications

Duc N.M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang, Wang

PDF

Open Access

TL;DR

This paper investigates how compression affects LLMs' knowledge, proposing that knowledge is displaced rather than forgotten, and introduces an inference-time dynamic prompting method that effectively restores performance with efficiency gains.

Contribution

The study introduces the IDP method for knowledge recovery in compressed LLMs, demonstrating its advantages over re-training and providing insights into knowledge displacement.

Findings

01

Prompting with IDP outperforms re-training methods like LoRA.

02

Knowledge is displaced, not forgotten, after compression.

03

IDP reduces inference latency by 60% and parameter size by 21x.

Abstract

Compressing Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. In this work, we dive into how compression damages LLMs' inherent knowledge and the possible remedies. We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed model to (re)learn from data with additional parameters; the other presumes that knowledge is internally displaced and hence one requires merely "inference re-direction" with input-side augmentation such as prompting, to recover the knowledge-related performance. Extensive experiments are then designed to (in)validate the two conjectures. We observe the promise of prompting in comparison to model tuning; we further unlock prompting's potential by introducing a variant called Inference-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques