TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs

Lanxiang Hu; Tajana Rosing; Hao Zhang

arXiv:2412.11242·cs.LG·December 20, 2024

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs

Lanxiang Hu, Tajana Rosing, Hao Zhang

PDF

Open Access 1 Video

TL;DR

TrimLLM is a novel method that reduces large language models' depth through progressive layer dropping, achieving significant inference speedup and domain-specific performance retention without hardware-dependent compression techniques.

Contribution

It introduces a layer-wise specialization-based approach for LLM compression that ensures hardware-agnostic speedup and maintains accuracy in domain-specific tasks.

Findings

01

Achieves 2.1-5.7x inference speedup on GPUs.

02

Maintains accuracy with 50-60% model compression.

03

Effective across various LLM sizes and domains.

Abstract

Specializing large language models (LLMs) for local deployment in domain-specific use cases is necessary for strong performance while meeting latency and privacy constraints. However, conventional task-specific adaptation approaches do not show simultaneous memory saving and inference speedup at deployment time. Practical compression techniques like quantization and pruning require dedicated hardware or kernel support to achieve measured inference speedup. We develop TrimLLM based on the layer-wise specialization phenomenon we empirically observed and verified on contemporary LLMs. TrimLLM reduces the depth of LLMs via progressive layer dropping. We show it retains LLMs' capacity in specific domains and achieves inference speedup irrespective of hardware and deep learning frameworks. We evaluated TrimLLM on LLMs of various sizes for inference; models adapted on medical, legal, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs· underline

Taxonomy

TopicsAdvanced Data Storage Technologies

MethodsPruning