When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering
Stephen Choi, William Gazeley

TL;DR
The paper introduces LLM-ADE, a dynamic framework for continued pre-training of large language models that improves adaptability and knowledge retention through architectural adjustments, demonstrated on TinyLlama with notable performance gains.
Contribution
LLM-ADE is a novel methodology employing adaptive architectural strategies for effective continued pre-training of LLMs, addressing catastrophic forgetting and double descent.
Findings
Significant performance improvements on various benchmarks.
Effective mitigation of catastrophic forgetting.
Enhanced model adaptability and robustness.
Abstract
This paper presents the LLM-ADE framework, a novel methodology for continued pre-training of large language models (LLMs) that addresses the challenges of catastrophic forgetting and double descent. LLM-ADE employs dynamic architectural adjustments, including selective block freezing and expansion, tailored to specific datasets. This strategy enhances model adaptability to new data while preserving previously acquired knowledge. We demonstrate LLM-ADE's effectiveness on the TinyLlama model across various general knowledge benchmarks, showing significant performance improvements without the drawbacks of traditional continuous training methods. This approach promises a more versatile and robust way to keep LLMs current and efficient in real-world applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Semantic Web and Ontologies
