FedFrozen: Two-Stage Federated Optimization via Attention Kernel Freezing

Junye Du; Zhenghao Li; Yushi Feng; Long Feng

arXiv:2605.06446·cs.LG·May 8, 2026

FedFrozen: Two-Stage Federated Optimization via Attention Kernel Freezing

Junye Du, Zhenghao Li, Yushi Feng, Long Feng

PDF

TL;DR

FedFrozen introduces a two-stage federated learning method that enhances Transformer robustness in heterogeneous environments by freezing attention kernels after initial warm-up training.

Contribution

This work proposes FedFrozen, a novel two-stage federated optimization framework that selectively freezes attention components to improve stability and performance in heterogeneous federated learning.

Findings

01

FedFrozen improves model stability under client heterogeneity.

02

The warm-up stage acts as an inexact descent on a regularized kernel-profile objective.

03

Freezing the attention kernel after warm-up enhances federated training effectiveness.

Abstract

Federated learning with heterogeneous clients remains a significant challenge for deep learning, primarily due to client drift arising from inconsistent local updates. Existing federated optimization methods typically address this issue through objective-level regularization or update-correction mechanisms. Recent studies, however, suggest that Transformer-based architectures may be inherently more robust than conventional models under heterogeneous federated training. Motivated by this observation, we investigate how different parameter components within the attention mechanism influence federated optimization. Specifically, we decompose the attention module into a query/key block, which determines the attention kernel, and a value block, which performs semantic transformation under the induced kernel. Based on this perspective, we propose FedFrozen, a two-stage federated optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.