Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training   and Model Merging: A Comprehensive Evaluation

Shamane Siriwardhana; Mark McQuade; Thomas Gauthier; Lucas Atkins,; Fernando Fernandes Neto; Luke Meyers; Anneketh Vij; Tyler Odenthal; Charles; Goddard; Mary MacCarthy; Jacob Solawetz

arXiv:2406.14971·cs.CL·June 24, 2024·2 cites

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Shamane Siriwardhana, Mark McQuade, Thomas Gauthier, Lucas Atkins,, Fernando Fernandes Neto, Luke Meyers, Anneketh Vij, Tyler Odenthal, Charles, Goddard, Mary MacCarthy, Jacob Solawetz

PDF

Open Access

TL;DR

This paper evaluates domain adaptation techniques for Llama3-70B-Instruct using continual pre-training and model merging, focusing on improving domain-specific performance while reducing catastrophic forgetting.

Contribution

It provides a comprehensive assessment of domain adaptation methods, including continual pre-training and model merging, for large language models in specialized domains.

Findings

01

Enhanced domain-specific performance after adaptation

02

Effective mitigation of catastrophic forgetting

03

Insights into model merging techniques

Abstract

We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of integrating financial regulatory data into a robust language model and examined the effectiveness of our model merging techniques in preserving and improving the model's instructive abilities. The model is accessible at hugging face: https://huggingface.co/arcee-ai/Llama-3-SEC-Base, arcee-ai/Llama-3-SEC-Base. This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training. This is a preprint technical report with thorough…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Speech Recognition and Synthesis

MethodsFocus