CO-PFL: Contribution-Oriented Personalized Federated Learning for Heterogeneous Networks
Ke Xing, Yanjie Dong, Xiaoyi Fan, Runhao Zeng, Victor C. M. Leung, M. Jamal Deen, Xiping Hu

TL;DR
CO-PFL introduces a contribution-aware federated learning algorithm that dynamically assesses client contributions to improve personalization, robustness, and convergence in heterogeneous networks.
Contribution
It proposes a novel contribution-oriented aggregation method using dual-subspace analysis and integrates parameter-wise personalization with mask-aware momentum optimization.
Findings
Outperforms state-of-the-art methods in personalization accuracy.
Enhances robustness and scalability across benchmark datasets.
Improves convergence stability in heterogeneous federated learning environments.
Abstract
Personalized federated learning (PFL) addresses a critical challenge of collaboratively training customized models for clients with heterogeneous and scarce local data. Conventional federated learning, which relies on a single consensus model, proves inadequate under such data heterogeneity. Its standard aggregation method of weighting client updates heuristically or by data volume, operates under an equal-contribution assumption, failing to account for the actual utility and reliability of each client's update. This often results in suboptimal personalization and aggregation bias. To overcome these limitations, we introduce Contribution-Oriented PFL (CO-PFL), a novel algorithm that dynamically estimates each client's contribution for global aggregation. CO-PFL performs a joint assessment by analyzing both gradient direction discrepancies and prediction deviations, leveraging…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The proposed method is novel in terms of specializing three modules for the improvement of PFL. - The authors validated proposed method extensively on multiple vision benchmark datasets.
- The definition of "contribution" should have been clearly defined. It is supposed to be a contribution toward an optimization of a global model, but it could differ in the PFL context, e.g., contribution to clients having similar distributions, contribution to data-deficient clients, ... - The rationale behind each module is heuristic and incremental rather than theory-driven or based on prior arts. - For example, the MAMO module resembles MADA module proposed in [7], as well as Adam optimiz
1. The introduction section is well written, providing a fairly comprehensive summary of the developments in the related research area. 2. The authors present the fundamental formulation of the optimization problem clearly, making it easy to understand the problem to be addressed.
1. The notation in the paper is rather confusing, as many symbols are not clearly defined at the point of use or beforehand. This includes those related to the structured decomposition of the model and the corresponding variables used during training. 2. The paper lacks sufficient theoretical analysis. It is built upon a series of pruning and aggregation strategies, introducing modifications at several key stages of the PFL framework. However, it remains unclear whether these modifications pres
* Each component is intuitively reasonable. * The empirical results indicate that every module contributes positively.
1. Overstated novelty on contribution-aware aggregation. Line 114 claims that “most existing methods fall short in effectively quantifying and incorporating the value of each client’s contribution during aggregation.” However, prior work has already studied contribution/weight learning (e.g., AFL [1], **Shapley-driven weighting*[2]). The paper neither compares with nor discusses these lines sufficiently. 2. PWPM and MAMO appear closely related to prior masked/sparse personalization and mask-a
**1) Clear and well-structured presentation:** The paper is clearly written, with a logical decomposition of ideas across the three modules. Each module addresses a distinct challenge in personalized federated learning—aggregation bias, over/under-personalization, and unstable optimization due to masking. The overall workflow and pseudocode are easy to follow, and the explanations are intuitive. **2) Strong empirical results:** The method shows consistent performance gains across multiple datas
**1) Clarification on the COWA computation:** It is not entirely clear where the computation of the COWA module, particularly the prediction score $\Gamma_n^{\mathrm{data}}$, is performed. Calculating this score requires access to the leave-one-out aggregated model $w_{-n}^k$, which itself depends on $\alpha_n^k$. However, $\alpha_n^{k+1}$ is computed on the *server side* using the contribution scores of all clients, as described in Algorithm 1. This creates uncertainty about the flow of informa
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Recommender Systems and Techniques · Domain Adaptation and Few-Shot Learning
