The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective

Guogang Zhu; Xuefeng Liu; Jianwei Niu; Shaojie Tang; Xinghao Wu

arXiv:2502.03231·cs.LG·June 17, 2025

The Other Side of the Coin: Unveiling the Downsides of Model Aggregation in Federated Learning from a Layer-peeled Perspective

Guogang Zhu, Xuefeng Liu, Jianwei Niu, Shaojie Tang, Xinghao Wu

PDF

Open Access 3 Reviews

TL;DR

This paper uncovers how model aggregation in federated learning causes cumulative feature degradation across layers, impairing performance, and analyzes why existing solutions mitigate this issue by preserving feature quality.

Contribution

It introduces a layer-peeled analysis framework revealing the root causes of performance drops in federated learning due to feature degradation during aggregation.

Findings

01

Aggregation degrades feature quality and coupling between features and layers.

02

Feature degradation accumulates with network depth, termed Cumulative Feature Degradation (CFD).

03

Existing solutions mitigate performance drop by reducing feature degradation.

Abstract

It is often observed that the aggregated model in FL underperforms on local data until after several rounds of local training. This temporary performance drop can potentially slow down the convergence of the FL model. Prior work regards this performance drop as an inherent cost of knowledge sharing among clients and does not give it special attention. While some studies directly focus on designing techniques to alleviate the issue, its root causes remain poorly understood. To bridge this gap, we construct a framework that enables layer-peeled analysis of how feature representations evolve during model aggregation in FL. It focuses on two key aspects: (1) the intrinsic quality of extracted features, and (2) the alignment between features and their subsequent parameters -- both of which are critical to downstream performance. Using this framework, we first investigate how model…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 3

Strengths

1. The authors rigorously demonstrate that the negative impact of aggregation is not a uniform hit but a compounding problem that progressively accumulates with network depth. 2. The study offers a balanced perspective. It not only identifies the downsides of aggregation (CFD) but also validates its crucial upside, showing that aggregation is what enables the model to create more generalizable features and mitigate local overfitting. 3. The paper introduces a "layer-peeled" analysis framework

Weaknesses

1. The experimental setup involves a very small number of clients (e.g., 4 clients for PACS, 6 for DomainNet). This is not representative of typical cross-device FL scenarios, which can involve hundreds, thousands, or even millions of clients. The dynamics of averaging four or six models may be very different from averaging thousands, and it remains an open question whether the severity and behavior of CFD would scale, diminish, or change entirely in a massively federated setting. 2. The paper

Reviewer 02Rating 2Confidence 4

Strengths

1. The figure illustrating the layer-wise performance trend is well-presented and effectively supports the analysis. 2. The experimental setup is described with sufficient clarity and detail to ensure reproducibility.

Weaknesses

1. Limited novelty compared to prior layer-wise/feature-alignment analyses. Prior work already diagnoses aggregation-induced feature/layer misalignment and layer-dependent behavior, and studies when layer-wise averaging or alignment helps (e.g., Fed2 [1] aligns features across clients; pFedLA [2] learns layer-wise aggregation analysis in personalized FL setting; FedFA provides detailed analysis of latent feature statistics and provide a feature alignment method; Layer-wise Linear Mode Connectivi

Reviewer 03Rating 4Confidence 3

Strengths

- Overall, the paper is well-written, and the figures and explanations are clear and easy to follow. - The paper provides a systematic set of metrics for analyzing the dynamics of features and model parameters in FL settings.

Weaknesses

- The analysis mainly focuses on the proposed analytical metrics without presenting accompanying accuracy trends to support the findings. While the paper suggests that model aggregation may degrade performance, it does not clearly demonstrate how the performance drops would correlate with the reported feature and parameter metrics and their dynamics. - At Lines 273-279, the paper briefly introduces and defines CFD as the larger relative changes in the metrics as network depths increase. However

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data

MethodsFocus