Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning

Yujia Huo; Jianchun Liu; Hongli Xu; Zhenguo Ma; Shilong Wang; Liusheng Huang

arXiv:2506.05977·cs.LG·June 9, 2025

Mitigating Catastrophic Forgetting with Adaptive Transformer Block Expansion in Federated Fine-Tuning

Yujia Huo, Jianchun Liu, Hongli Xu, Zhenguo Ma, Shilong Wang, Liusheng Huang

PDF

Open Access

TL;DR

FedBE introduces an adaptive transformer block expansion method for federated fine-tuning of large language models, effectively mitigating catastrophic forgetting and improving model generalization in heterogeneous distributed environments.

Contribution

The paper proposes FedBE, a novel federated fine-tuning framework that dynamically expands and allocates transformer blocks to address catastrophic forgetting and heterogeneity.

Findings

01

Achieves 12-74% higher accuracy retention compared to existing methods.

02

Accelerates model convergence by 1.9-3.1 times.

03

Effectively handles data and device heterogeneity in federated settings.

Abstract

Federated fine-tuning (FedFT) of large language models (LLMs) has emerged as a promising solution for adapting models to distributed data environments while ensuring data privacy. Existing FedFT methods predominantly utilize parameter-efficient fine-tuning (PEFT) techniques to reduce communication and computation overhead. However, they often fail to adequately address the catastrophic forgetting, a critical challenge arising from continual adaptation in distributed environments. The traditional centralized fine-tuning methods, which are not designed for the heterogeneous and privacy-constrained nature of federated environments, struggle to mitigate this issue effectively. Moreover, the challenge is further exacerbated by significant variation in data distributions and device capabilities across clients, which leads to intensified forgetting and degraded model generalization. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks