FoldGPT: Simple and Effective Large Language Model Compression Scheme

Songwei Liu; Chao Zeng; Lianqiang Li; Chenqian Yan; Lean Fu; Xing Mei,; Fangmin Chen

arXiv:2407.00928·cs.LG·July 2, 2024·1 cites

FoldGPT: Simple and Effective Large Language Model Compression Scheme

Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei,, Fangmin Chen

PDF

Open Access

TL;DR

FoldGPT introduces a simple yet effective compression method for large language models by removing redundant layers and sharing parameters, significantly reducing model size while maintaining performance.

Contribution

The paper presents a novel compression scheme combining block removal and parameter sharing, leveraging layer output similarity to efficiently compress LLMs.

Findings

01

FoldGPT outperforms previous SOTA compression methods.

02

Redundant layer outputs increase with model size, enabling effective removal.

03

Parameter sharing within groups maintains model performance after compression.

Abstract

The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of most layers exhibit significant similarity. Moreover, this similarity becomes more pronounced as the model size increases, indicating substantial redundancy in the depth direction of the LLMs. Based on this observation, we propose an efficient model volume compression strategy, termed FoldGPT, which combines block removal and block parameter sharing.This strategy consists of three parts: (1) Based on the learnable gating parameters, we determine the block importance ranking while modeling the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis