Evaluating the Impact of Compression Techniques on Task-Specific   Performance of Large Language Models

Bishwash Khanal; Jeffery M. Capone

arXiv:2409.11233·cs.CL·September 18, 2024

Evaluating the Impact of Compression Techniques on Task-Specific Performance of Large Language Models

Bishwash Khanal, Jeffery M. Capone

PDF

Open Access

TL;DR

This paper assesses how different compression techniques affect large language models' performance on specific tasks, emphasizing the importance of evaluation metrics and calibration data in maintaining model utility after compression.

Contribution

It introduces Jensen-Shannon Divergence as a new evaluation metric and highlights the significance of task-specific calibration data for better compressed model performance.

Findings

01

SparseGPT and Wanda maintain perplexity at high sparsity levels

02

Perplexity alone is insufficient to evaluate compression impact

03

Task-specific calibration data improves downstream task performance

Abstract

Large language models (LLMs) offer powerful capabilities but incur substantial computational costs, driving the need for efficient compression techniques. This study evaluates the impact of popular compression methods - Magnitude Pruning, SparseGPT, and Wanda - on the LLaMA-2-7B model, focusing on the trade-offs between model size reduction, downstream task performance, and the role of calibration data. Our findings reveal that while SparseGPT and Wanda preserve perplexity even at 50% sparsity, they suffer significant degradation on downstream tasks, highlighting the inadequacy of perplexity as the sole evaluation metric. To address this, we introduce Jensen-Shannon (JS) Divergence as a more comprehensive metric that captures nuanced changes in model behavior post-compression. We further demonstrate that task-specific calibration data significantly enhances the downstream performance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsPruning