LLaMA Pro: Progressive LLaMA with Block Expansion
Chengyue Wu, Yukang Gan, Yixiao Ge, Zeyu Lu, Jiahao Wang, Ye Feng,, Ying Shan, Ping Luo

TL;DR
LLaMA Pro introduces a progressive expansion method for LLMs by adding Transformer blocks and fine-tuning with new data, enhancing knowledge without forgetting, and achieving superior performance across tasks.
Contribution
The paper presents a novel post-pretraining block expansion technique for LLMs, improving knowledge retention and task versatility without catastrophic forgetting.
Findings
LLaMA Pro-8.3B outperforms existing open models in benchmarks.
Effective knowledge integration from new data without forgetting.
Demonstrates strong performance in programming and mathematics tasks.
Abstract
Humans generally acquire new skills without compromising the old; however, the opposite holds for Large Language Models (LLMs), e.g., from LLaMA to CodeLLaMA. To this end, we propose a new post-pretraining method for LLMs with an expansion of Transformer blocks. We tune the expanded blocks using only new corpus, efficiently and effectively improving the model's knowledge without catastrophic forgetting. In this paper, we experiment on the corpus of code and math, yielding LLaMA Pro-8.3B, a versatile foundation model initialized from LLaMA2-7B, excelling in general tasks, programming, and mathematics. LLaMA Pro and its instruction-following counterpart (LLaMA Pro-Instruct) achieve advanced performance among various benchmarks, demonstrating superiority over existing open models in the LLaMA family and the immense potential of reasoning and addressing diverse tasks as an intelligent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗DavidAU/Gemma-The-Writer-N-Restless-Quill-10B-Uncensored-GGUFmodel· 6.9k dl· ♡ 1366.9k dl♡ 136
- 🤗DavidAU/MN-DARKEST-UNIVERSE-29B-GGUFmodel· 2.5k dl· ♡ 672.5k dl♡ 67
- 🤗DavidAU/LLama-3.1-128k-Darkest-Planet-Uncensored-16.5B-GGUFmodel· 981 dl· ♡ 15981 dl♡ 15
- 🤗DavidAU/Qwen3-42B-A3B-2507-Thinking-Abliterated-uncensored-TOTAL-RECALL-v2-Medium-MASTER-CODERmodel· 389 dl· ♡ 37389 dl♡ 37
- 🤗DavidAU/Mistral-Nemo-Instruct-2407-13.35B-BRAINSTORM-5x-FORM-11-GGUFmodel· 524 dl· ♡ 8524 dl♡ 8
- 🤗DavidAU/Qwen3-53B-A3B-2507-TOTAL-RECALL-v2-MASTER-CODERmodel· 32 dl· ♡ 1632 dl♡ 16
- 🤗DavidAU/Qwen3-Coder-53B-A3B-Instruct-TOTAL-RECALL-v2-MASTER-CODER-Lmodel· 28 dl· ♡ 1428 dl♡ 14
- 🤗TencentARC/MetaMath-Mistral-Promodel· 8 dl· ♡ 58 dl♡ 5
- 🤗trollek/NinjaMouse-2.4B-32L-danubemodel· 111 dl· ♡ 8111 dl♡ 8
- 🤗chenlei1983/testtohfmodel· 1 dl· ♡ 11 dl♡ 1
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Softmax · Label Smoothing · Adam · Dropout · Absolute Position Encodings · Layer Normalization
