SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters
Yiping Wang, Hanxian Huang, Yifang Chen, Jishen Zhao, Simon Shaolei, Du, Yuandong Tian

TL;DR
SHARP is a method that shares parameters between adjacent layers of large language models to reduce memory and computation costs, while recovery parameters and fine-tuning maintain model performance on resource-limited devices.
Contribution
The paper introduces SHARP, a novel layer-sharing approach with recovery parameters, enabling efficient LLM inference with minimal performance loss on mobile devices.
Findings
Reduces model storage by up to 65%
Cuts inference time by 42.2% on mobile devices
Maintains perplexity with limited fine-tuning data
Abstract
While Large language models (LLMs) have advanced natural language processing tasks, their growing computational and memory demands make deployment on resource-constrained devices like mobile phones increasingly challenging. In this paper, we propose SHARP (SHaring Adjacent Layers with Recovery Parameters), a novel approach to accelerate LLM inference by sharing parameters across adjacent layers, thus reducing memory load overhead, while introducing low-rank recovery parameters to maintain performance. Inspired by observations that consecutive layers have similar outputs, SHARP employs a two-stage recovery process: Single Layer Warmup (SLW), and Supervised Fine-Tuning (SFT). The SLW stage aligns the outputs of the shared layers using L_2 loss, providing a good initialization for the following SFT stage to further restore the model performance. Extensive experiments demonstrate that SHARP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
