Why Lift so Heavy? Slimming Large Language Models by Cutting Off the   Layers

Shuzhou Yuan; Ercong Nie; Bolei Ma; Michael F\"arber

arXiv:2402.11700·cs.CL·April 18, 2025·1 cites

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

Shuzhou Yuan, Ercong Nie, Bolei Ma, Michael F\"arber

PDF

Open Access

TL;DR

This paper investigates reducing the number of layers in large language models, revealing that fewer layers can maintain or even improve performance, especially in prompt-based fine-tuning, offering a new way to make LLMs more efficient.

Contribution

It systematically explores layer reduction in LLMs, demonstrating that significantly smaller models can match or outperform full models in certain NLP tasks.

Findings

01

Fewer layers can maintain or improve performance in LLMs.

02

Single-layer models sometimes outperform multi-layer counterparts.

03

Layer reduction leads to more efficient LLM deployment.

Abstract

Large Language Models (LLMs) possess outstanding capabilities in addressing various natural language processing (NLP) tasks. However, the sheer size of these models poses challenges in terms of storage, training and inference due to the inclusion of billions of parameters through layer stacking. While traditional approaches such as model pruning or distillation offer ways for reducing model size, they often come at the expense of performance retention. In our investigation, we systematically explore the approach of reducing the number of layers in LLMs. Surprisingly, we observe that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks. Remarkably, in certain cases, models with a single layer outperform their fully layered counterparts. These findings offer valuable insights for future work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Topic Modeling

MethodsPruning