Preparing Lessons for Progressive Training on Language Models

Yu Pan; Ye Yuan; Yichun Yin; Jiaxin Shi; Zenglin Xu; Ming Zhang,; Lifeng Shang; Xin Jiang; Qun Liu

arXiv:2401.09192·cs.LG·February 13, 2024·1 cites

Preparing Lessons for Progressive Training on Language Models

Yu Pan, Ye Yuan, Yichun Yin, Jiaxin Shi, Zenglin Xu, Ming Zhang,, Lifeng Shang, Xin Jiang, Qun Liu

PDF

Open Access 1 Repo 4 Models 1 Video

TL;DR

The paper introduces Apollo, a novel training method that enables efficient progressive expansion of language models by learning high-layer functions during low-layer training, significantly reducing resource use and environmental impact.

Contribution

Apollo is a new approach that prepares lessons for model expansion using low-value-prioritized sampling, weight sharing, and interpolation, improving training efficiency without pretrained models.

Findings

01

Achieves state-of-the-art acceleration ratios.

02

Rivals pretrained model-based methods in efficiency.

03

Reduces training time and environmental costs.

Abstract

The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions due to growing model sizes. Prior work suggests using pretrained small models to improve training efficiency, but this approach may not be suitable for new model structures. On the other hand, training from scratch can be slow, and progressively stacking layers often fails to achieve significant acceleration. To address these challenges, we propose a novel method called Apollo, which prep\textbf{a}res lessons for ex\textbf{p}anding \textbf{o}perations by \textbf{l}earning high-\textbf{l}ayer functi\textbf{o}nality during training of low layers. Our approach involves low-value-prioritized sampling (LVPS) to train different depths and weight sharing to facilitate efficient expansion. We also introduce an interpolation method for stable model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuanyehome/Apollo-AAAI-2024-Release
pytorchOfficial

Models

Videos

Preparing Lessons for Progressive Training on Language Models· underline

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Natural Language Processing Techniques

MethodsAdaptive Parameter-wise Diagonal Quasi-Newton Method