LLaMA Pro: Progressive LLaMA with Block Expansion

Chengyue Wu; Yukang Gan; Yixiao Ge; Zeyu Lu; Jiahao Wang; Ye Feng,; Ying Shan; Ping Luo

arXiv:2401.02415·cs.CL·May 31, 2024·1 cites

LLaMA Pro: Progressive LLaMA with Block Expansion

Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng,, Ying Shan, Ping Luo

PDF

Open Access 1 Repo 10 Models 2 Videos

TL;DR

LLaMA Pro introduces a progressive expansion method for LLMs by adding Transformer blocks and fine-tuning with new data, enhancing knowledge without forgetting, and achieving superior performance across tasks.

Contribution

The paper presents a novel post-pretraining block expansion technique for LLMs, improving knowledge retention and task versatility without catastrophic forgetting.

Findings

01

LLaMA Pro-8.3B outperforms existing open models in benchmarks.

02

Effective knowledge integration from new data without forgetting.

03

Demonstrates strong performance in programming and mathematics tasks.

Abstract

Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencentarc/llama-pro
pytorchOfficial

Models

Videos

LLaMA Pro: Progressive LLaMA with Block Expansion (Paper Explained)· youtube

LLaMA Pro: Progressive LLaMA with Block Expansion· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Label Smoothing · Adam · Dropout · Absolute Position Encodings · Layer Normalization