FedSQ: Optimized Weight Averaging via Fixed Gating

Cristian P\'erez-Corral; Jose I. Mestre; Alberto Fern\'andez-Hern\'andez; Manuel F. Dolz; Jos\'e Duato; Enrique S. Quintana-Ort\'i

arXiv:2604.02990·cs.LG·April 6, 2026

FedSQ: Optimized Weight Averaging via Fixed Gating

Cristian P\'erez-Corral, Jose I. Mestre, Alberto Fern\'andez-Hern\'andez, Manuel F. Dolz, Jos\'e Duato, Enrique S. Quintana-Ort\'i

PDF

TL;DR

FedSQ is a federated learning method that stabilizes training on heterogeneous data by freezing structural components of a pretrained model and only fine-tuning the quantitative parts, improving robustness and efficiency.

Contribution

It introduces FedSQ, a novel approach that uses fixed gating masks based on structural knowledge to stabilize federated training with pretrained models.

Findings

01

FedSQ improves robustness under data heterogeneity.

02

FedSQ reduces rounds-to-best validation performance.

03

FedSQ preserves accuracy in transfer learning scenarios.

Abstract

Federated learning (FL) enables collaborative training across organizations without sharing raw data, but it is hindered by statistical heterogeneity (non-i.i.d.\ client data) and by instability of naive weight averaging under client drift. In many cross-silo deployments, FL is warm-started from a strong pretrained backbone (e.g., ImageNet-1K) and then adapted to local domains. Motivated by recent evidence that ReLU-like gating regimes (structural knowledge) stabilize earlier than the remaining parameter values (quantitative knowledge), we propose FedSQ (Federated Structural-Quantitative learning), a transfer-initialized neural federated procedure based on a DualCopy, piecewise-linear view of deep networks. FedSQ freezes a structural copy of the pretrained model to induce fixed binary gating masks during federated fine-tuning, while only a quantitative copy is optimized locally and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.