Mechanistic Analysis of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning
Olaf Yunus Laitinen Imanov

TL;DR
This paper provides a detailed mechanistic understanding of why catastrophic forgetting occurs in large language models during sequential fine-tuning, identifying key factors like gradient interference and representational drift.
Contribution
It offers the first comprehensive analysis of the mechanisms behind catastrophic forgetting in transformer-based LLMs during continual fine-tuning, across multiple model scales and task sequences.
Findings
Gradient interference in attention weights contributes to forgetting.
Representational drift occurs in intermediate layers during fine-tuning.
Forgetting severity correlates with task similarity and gradient alignment.
Abstract
Large language models exhibit remarkable performance across diverse tasks through pre-training and fine-tuning paradigms. However, continual fine-tuning on sequential tasks induces catastrophic forgetting, where newly acquired knowledge interferes with previously learned capabilities. Despite widespread observations of this phenomenon, the mechanistic understanding remains limited. Here, we present a comprehensive mechanistic analysis of catastrophic forgetting in transformer-based LLMs during sequential fine-tuning. Through systematic experiments across multiple model scales (109B to 400B total parameters) and task sequences, we identify three primary mechanisms driving forgetting: gradient interference in attention weights, representational drift in intermediate layers, and loss landscape flattening. We demonstrate that forgetting severity correlates strongly with task similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Memory Processes and Influences · Multimodal Machine Learning Applications
