DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Zhen Tan; Daize Dong; Xinyu Zhao; Jie Peng; Yu Cheng; Tianlong Chen

arXiv:2407.11030·cs.LG·July 17, 2024·1 cites

DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Zhen Tan, Daize Dong, Xinyu Zhao, Jie Peng, Yu Cheng, Tianlong Chen

PDF

Open Access

TL;DR

This paper presents Dynamic Layer Operations (DLO), a method for efficiently scaling transformer-based LLMs by dynamically adjusting layers during inference, improving efficiency without sacrificing performance.

Contribution

DLO introduces a novel dynamic layer routing policy for vertical scaling of LLMs, eliminating the need for resource-intensive pre-training.

Findings

01

DLO outperforms original unscaled models in efficiency and accuracy.

02

DLO achieves results comparable to densely expanded models.

03

The approach reduces computational resources required for large models.

Abstract

In this paper, we introduce Dynamic Layer Operations (DLO), a novel approach for vertically scaling transformer-based Large Language Models (LLMs) by dynamically expanding, activating, or skipping layers using a sophisticated routing policy based on layerwise feature similarity. Unlike traditional Mixture-of-Experts (MoE) methods that focus on extending the model width, our approach targets model depth, addressing the redundancy observed across layer representations for various input samples. Our framework is integrated with the Supervised Fine-Tuning (SFT) stage, eliminating the need for resource-intensive Continual Pre-Training (CPT). Experimental results demonstrate that DLO not only outperforms the original unscaled models but also achieves comparable results to densely expanded models with significantly improved efficiency. Our work offers a promising direction for building…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvancements in Photolithography Techniques · 3D IC and TSV technologies · Iterative Learning Control Systems