Evaluating Zero-Shot Long-Context LLM Compression
Chenyu Wang, Yihan Wang, Kai Li

TL;DR
This paper investigates zero-shot compression methods for large language models in long-context scenarios, identifying error patterns and proposing hypotheses and remedies to improve performance, based on experiments with LLaMA-2-7B-32K.
Contribution
It introduces a hypothesis explaining varied compression technique behaviors and explores remedies to mitigate long-context performance decline in LLMs.
Findings
Computational errors increase under long-context with certain compression methods.
Different compression techniques exhibit varied behaviors under long-context.
Proposed remedies can mitigate performance decline in some techniques.
Abstract
This study evaluates the effectiveness of zero-shot compression techniques on large language models (LLMs) under long-context. We identify the tendency for computational errors to increase under long-context when employing certain compression methods. We propose a hypothesis to explain the varied behavior of different LLM compression techniques and explore remedies to mitigate the performance decline observed in some techniques under long-context. This is a course report for COS 598D Machine Learning and Systems by Prof. Kai Li at Princeton University. Due to limited computational resources, our experiments were conducted only on LLaMA-2-7B-32K.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Advanced MRI Techniques and Applications · Advanced Data Compression Techniques
