Control LLM: Controlled Evolution for Intelligence Retention in LLM
Haichao Wei, Yunxiang Ren, Zhoutong Fu, Aman Lunia, Yi-Lin Chen, Alice, Leung, Ya Xu

TL;DR
Control LLM introduces a novel method using parallel transformer blocks and interpolation to mitigate catastrophic forgetting, enabling continual learning in LLMs with improved performance and minimal degradation.
Contribution
It presents a new approach that preserves existing knowledge while integrating new information in LLMs through parallel blocks and state interpolation.
Findings
Significant improvements in mathematical reasoning and coding performance.
Enhanced multilingual capabilities across multiple benchmarks.
Achieves state-of-the-art results among open-source models with less data and compute.
Abstract
Large Language Models (LLMs) demand significant computational resources, making it essential to enhance their capabilities without retraining from scratch. A key challenge in this domain is \textit{catastrophic forgetting} (CF), which hampers performance during Continuous Pre-training (CPT) and Continuous Supervised Fine-Tuning (CSFT). We propose \textbf{Control LLM}, a novel approach that leverages parallel pre-trained and expanded transformer blocks, aligning their hidden-states through interpolation strategies This method effectively preserves performance on existing tasks while seamlessly integrating new knowledge. Extensive experiments demonstrate the effectiveness of Control LLM in both CPT and CSFT. On Llama3.1-8B-Instruct, it achieves significant improvements in mathematical reasoning ( on Math-Hard) and coding performance ( on MBPP-PLUS). On Llama3.1-8B, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ControlLLM/Llama3.1-8B-OpenMath16-Instructmodel· 14 dl14 dl
- 🤗ControlLLM/Control-LLM-Llama3.1-8B-Math16-Instructmodel· 18 dl18 dl
- 🤗ControlLLM/Llama-3.1-8B-OpenCoder16-Instructmodel· 10 dl10 dl
- 🤗ControlLLM/Control-LLM-Llama3.1-8B-OpenCoder8-Instructmodel· 20 dl20 dl
- 🤗ControlLLM/Llama-3.1-8B-SynE-FPTmodel· 10 dl10 dl
- 🤗ControlLLM/Llama-3.1-8B-SynE-Hybrid16model· 8 dl8 dl
- 🤗ControlLLM/Llama-3.1-8B-SynE-Concat16-Lerpmodel· 4 dl4 dl
- 🤗ControlLLM/Llama-3.1-8B-SynE-Concat16-Dlerpmodel· 5 dl5 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScheduling and Optimization Algorithms
MethodsBalanced Selection
