Amuro and Char: Analyzing the Relationship between Pre-Training and   Fine-Tuning of Large Language Models

Kaiser Sun; Mark Dredze

arXiv:2408.06663·cs.CL·March 19, 2025

Amuro and Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models

Kaiser Sun, Mark Dredze

PDF

Open Access 1 Models

TL;DR

This paper investigates how pre-training and fine-tuning influence large language models, revealing that continual pre-training enhances latent capabilities, fine-tuning can cause knowledge forgetting, and more pre-training reduces prompt sensitivity.

Contribution

It provides a comprehensive analysis of the relationship between pre-training and fine-tuning, highlighting the benefits and drawbacks of continual pre-training and fine-tuning strategies.

Findings

01

Continual pre-training improves model capabilities in a latent manner.

02

Extra fine-tuning benefits datasets where the model initially underperforms.

03

Fine-tuning can lead to forgetting of previously learned domain knowledge.

Abstract

The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or downstream tasks. In this work, we investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints. Our results on 18 datasets suggest that i) continual pre-training improves the model in a latent way that unveils after fine-tuning; ii) with extra fine-tuning, the datasets that the model does not demonstrate capability gain much more than those that the model performs well during the pre-training stage; iii) although model benefits significantly through supervised fine-tuning, it may forget previously known domain knowledge and the tasks that are not seen during fine-tuning; iv) the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
KaiserWhoLearns/PTvsSFT_OLMo1b
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN