EfficientLLM: Scalable Pruning-Aware Pretraining for   Architecture-Agnostic Edge Language Models

Xingrun Xing; Zheng Liu; Shitao Xiao; Boyan Gao; Yiming Liang; Wanpeng; Zhang; Haokun Lin; Guoqi Li; Jiajun Zhang

arXiv:2502.06663·cs.LG·February 13, 2025

EfficientLLM: Scalable Pruning-Aware Pretraining for Architecture-Agnostic Edge Language Models

Xingrun Xing, Zheng Liu, Shitao Xiao, Boyan Gao, Yiming Liang, Wanpeng, Zhang, Haokun Lin, Guoqi Li, Jiajun Zhang

PDF

Open Access

TL;DR

EfficientLLM introduces a scalable, pruning-aware pretraining method that automatically designs architecture-agnostic edge language models, achieving state-of-the-art performance with significantly reduced parameters.

Contribution

This work presents the first architecture-agnostic, pruning-aware pretraining approach that surpasses human-designed models in efficiency and performance for edge language models.

Findings

01

Outperforms state-of-the-art models with 100M-1B parameters.

02

Achieves top-quality edge language models through scalable pruning.

03

Bridges the gap between compression and direct pretraining methods.

Abstract

Modern large language models (LLMs) driven by scaling laws, achieve intelligence emergency in large model sizes. Recently, the increasing concerns about cloud costs, latency, and privacy make it an urgent requirement to develop compact edge language models. Distinguished from direct pretraining that bounded by the scaling law, this work proposes the pruning-aware pretraining, focusing on retaining performance of much larger optimized models. It features following characteristics: 1) Data-scalable: we introduce minimal parameter groups in LLM and continuously optimize structural pruning, extending post-training pruning methods like LLM-Pruner and SparseGPT into the pretraining phase. 2) Architecture-agnostic: the LLM architecture is auto-designed using saliency-driven pruning, which is the first time to exceed SoTA human-designed LLMs in modern pretraining. We reveal that it achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsService-Oriented Architecture and Web Services · Semantic Web and Ontologies · Software System Performance and Reliability