Evaluating Zero-Shot Long-Context LLM Compression

Chenyu Wang; Yihan Wang; Kai Li

arXiv:2406.06773·cs.CL·February 14, 2025

Evaluating Zero-Shot Long-Context LLM Compression

Chenyu Wang, Yihan Wang, Kai Li

PDF

Open Access

TL;DR

This paper investigates zero-shot compression methods for large language models in long-context scenarios, identifying error patterns and proposing hypotheses and remedies to improve performance, based on experiments with LLaMA-2-7B-32K.

Contribution

It introduces a hypothesis explaining varied compression technique behaviors and explores remedies to mitigate long-context performance decline in LLMs.

Findings

01

Computational errors increase under long-context with certain compression methods.

02

Different compression techniques exhibit varied behaviors under long-context.

03

Proposed remedies can mitigate performance decline in some techniques.

Abstract

This study evaluates the effectiveness of zero-shot compression techniques on large language models (LLMs) under long-context. We identify the tendency for computational errors to increase under long-context when employing certain compression methods. We propose a hypothesis to explain the varied behavior of different LLM compression techniques and explore remedies to mitigate the performance decline observed in some techniques under long-context. This is a course report for COS 598D Machine Learning and Systems by Prof. Kai Li at Princeton University. Due to limited computational resources, our experiments were conducted only on LLaMA-2-7B-32K.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Advanced MRI Techniques and Applications · Advanced Data Compression Techniques