Lightweight and Post-Training Structured Pruning for On-Device Large   Lanaguage Models

Zihuai Xu; Yang Xu; Hongli Xu; Yunming Liao; Zhiwei Yao; Zuan Xie

arXiv:2501.15255·cs.LG·January 28, 2025

Lightweight and Post-Training Structured Pruning for On-Device Large Lanaguage Models

Zihuai Xu, Yang Xu, Hongli Xu, Yunming Liao, Zhiwei Yao, Zuan Xie

PDF

Open Access

TL;DR

This paper presents COMP, a lightweight post-training structured pruning method for large language models that reduces resource demands on devices without fine-tuning, using hybrid pruning and a new importance metric.

Contribution

The paper introduces COMP, a novel pruning approach combining coarse and fine-grained pruning with mask tuning, suitable for on-device LLM deployment without fine-tuning.

Findings

01

Achieves 6.13% performance improvement on LLaMA-2-7B at 20% pruning

02

Reduces memory overhead by 80%

03

Outperforms LLM-Pruner in efficiency and effectiveness

Abstract

Considering the hardware-friendly characteristics and broad applicability, structured pruning has emerged as an efficient solution to reduce the resource demands of large language models (LLMs) on resource-constrained devices. Traditional structured pruning methods often need fine-tuning to recover performance loss, which incurs high memory overhead and substantial data requirements, rendering them unsuitable for on-device applications. Additionally, post-training structured pruning techniques typically necessitate specific activation functions or architectural modifications, thereby limiting their scope of applications. Herein, we introduce COMP, a lightweight post-training structured pruning method that employs a hybrid-granularity pruning strategy. COMP initially prunes selected model layers based on their importance at a coarse granularity, followed by fine-grained neuron pruning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Robotics and Sensor-Based Localization

MethodsPruning