When Life gives you LLMs, make LLM-ADE: Large Language Models with   Adaptive Data Engineering

Stephen Choi; William Gazeley

arXiv:2404.13028·cs.CE·April 22, 2024

When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering

Stephen Choi, William Gazeley

PDF

Open Access 2 Models 2 Datasets

TL;DR

The paper introduces LLM-ADE, a dynamic framework for continued pre-training of large language models that improves adaptability and knowledge retention through architectural adjustments, demonstrated on TinyLlama with notable performance gains.

Contribution

LLM-ADE is a novel methodology employing adaptive architectural strategies for effective continued pre-training of LLMs, addressing catastrophic forgetting and double descent.

Findings

01

Significant performance improvements on various benchmarks.

02

Effective mitigation of catastrophic forgetting.

03

Enhanced model adaptability and robustness.

Abstract

This paper presents the LLM-ADE framework, a novel methodology for continued pre-training of large language models (LLMs) that addresses the challenges of catastrophic forgetting and double descent. LLM-ADE employs dynamic architectural adjustments, including selective block freezing and expansion, tailored to specific datasets. This strategy enhances model adaptability to new data while preserving previously acquired knowledge. We demonstrate LLM-ADE's effectiveness on the TinyLlama model across various general knowledge benchmarks, showing significant performance improvements without the drawbacks of traditional continuous training methods. This approach promises a more versatile and robust way to keep LLMs current and efficient in real-world applications.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing · Semantic Web and Ontologies