Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Weiqiao Shan, Long Meng, Tong Zheng, Yingfeng Luo, Bei Li, and junxin Wang, Tong Xiao, Jingbo Zhu

TL;DR
This paper demonstrates that early exit is an inherent feature of transformer-based large language models, and explores how to utilize it without additional layers or joint optimization, revealing insights into its behavior and potential.
Contribution
It shows that early exit naturally exists in transformer models and investigates how to leverage it without extra layers or joint optimization, providing new understanding of EE behavior.
Findings
EE is a natural capability in transformer models.
Joint optimization improves EE layer selection accuracy.
EE patterns vary from a sub-word perspective.
Abstract
Large language models (LLMs) exhibit exceptional performance across various downstream tasks. However, they encounter limitations due to slow inference speeds stemming from their extensive parameters. The early exit (EE) is an approach that aims to accelerate auto-regressive decoding. EE generates outputs from intermediate layers instead of using the whole model, which offers a promising solution to this challenge. However, additional output layers and joint optimization used in conventional EE hinder the application of EE in LLMs. In this paper, we explore the possibility of LLMs EE without additional output layers and joint optimization. Our findings indicate that EE is a natural capability within transformer-based models. While joint optimization does not give model EE capability, it must be employed to address challenges by improving the accuracy of locating the optimal EE layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Transformer Diagnostics and Insulation · Magnetic Properties and Applications · Power System Reliability and Maintenance
MethodsLLaMA
