LESA: Learnable LLM Layer Scaling-Up

Yifei Yang; Zouying Cao; Xinbei Ma; Yao Yao; Libo Qin; Zhi Chen and; Hai Zhao

arXiv:2502.13794·cs.LG·February 20, 2025

LESA: Learnable LLM Layer Scaling-Up

Yifei Yang, Zouying Cao, Xinbei Ma, Yao Yao, Libo Qin, Zhi Chen and, Hai Zhao

PDF

Open Access 1 Repo 1 Video

TL;DR

LESA introduces a learnable layer scaling method for LLMs that improves initialization and training efficiency by predicting inter-layer parameters, outperforming existing methods with reduced computational costs.

Contribution

LESA proposes a novel neural network-based approach for depth scaling-up of LLMs, enabling learnable inter-layer parameters and faster, more effective training.

Findings

01

LESA achieves better performance than baseline methods.

02

LESA reduces training cost by over 50%.

03

LESA demonstrates robustness across various model sizes and tasks.

Abstract

Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. However, existing depth scaling-up methods rely on empirical heuristic rules for layer duplication, which result in poorer initialization and slower convergence during continual pre-training. We propose \textbf{LESA}, a novel learnable method for depth scaling-up. By concatenating parameters from each layer and applying Singular Value Decomposition, we uncover latent patterns between layers, suggesting that inter-layer parameters can be learned. LESA uses a neural network to predict the parameters inserted between adjacent layers, enabling better initialization and faster training. Experiments show that LESA outperforms existing baselines,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangyifei729/lesa
pytorchOfficial

Videos

LESA: Learnable LLM Layer Scaling-Up· underline

Taxonomy

TopicsMathematics, Computing, and Information Processing · Speech Recognition and Synthesis · Neural Networks and Applications