FedCova: Robust Federated Covariance Learning Against Noisy Labels
Xiangyu Zhong, Xiaojun Yuan, Ying-Jun Angela Zhang

TL;DR
FedCova introduces a robust federated learning framework that enhances model resilience to noisy labels by leveraging feature covariances, eliminating reliance on external clean datasets or device selection.
Contribution
It proposes a novel covariance-based approach for intrinsic robustness in federated learning, unifying feature encoding, classifier construction, and label correction.
Findings
Outperforms state-of-the-art methods on CIFAR-10/100 and Clothing1M datasets.
Effective in both symmetric and asymmetric noisy settings.
Demonstrates superior robustness against label noise.
Abstract
Noisy labels in distributed datasets induce severe local overfitting and consequently compromise the global model in federated learning (FL). Most existing solutions rely on selecting clean devices or aligning with public clean datasets, rather than endowing the model itself with robustness. In this paper, we propose FedCova, a dependency-free federated covariance learning framework that eliminates such external reliances by enhancing the model's intrinsic robustness via a new perspective on feature covariances. Specifically, FedCova encodes data into a discriminative but resilient feature space to tolerate label noise. Built on mutual information maximization, we design a novel objective for federated lossy feature encoding that relies solely on class feature covariances with an error tolerance term. Leveraging feature subspaces characterized by covariances, we construct a…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The proposed solution is conceptually interesting and demonstrates creativity in its construction. 2. The notation throughout the paper is clear and mathematically rigorous. 3. The toy example provided in the Appendix effectively illustrates the main idea and makes the proposed solution easy to understand. 4. The ablation studies presented in the Appendix (e.g., Figures 2 and 3) show promising results and provide useful insights into the method’s behavior. 5. The analytical discussions in the
1. **Baselines:** More baseline methods should be included for a fair comparison. For instance, a straightforward approach to address the noise learning problem in federated learning is to apply existing noise learning techniques locally within the FedAVG framework. The authors should consider this variant to improve the completeness and credibility of the experimental evaluation. 2. **Ablation study:** The ablation study should analyze the impact of the number of clients on model performance, a
1. The paper tackles the important challenge of noisy labels in federated settings, where labels from distributed edge devices are vulnerable to annotation errors, sensor faults, and adversarial attacks. The covariance-based approach naturally avoids bias from mislabeled data by focusing on feature structures rather than centroids. Unlike existing methods requiring warm-up rounds, clean public datasets, or extremely noisy devices, FedCova achieves robustness without these additional dependencies
1. Insufficient privacy analysis and vulnerability to potential attacks: While the paper claims that covariance transmission poses lower privacy risk than raw features due to dimensionality reduction, this assertion lacks rigorous analysis. Recent work has shown that covariance matrices can still leak sensitive information about training data through gradient-based attacks or reconstruction methods. The paper provides no formal privacy guarantees, no discussion of differential privacy mechanisms
1. The main idea is interesting. The paper presents a principled information-theoretic approach by maximizing mutual information I(Z;Y) while focusing on covariance structures rather than mean statistics. 2. The integration of feature encoding, classification, and label correction through covariance structures provides an elegant and cohesive solution, avoiding the need for auxiliary clean datasets or duplicate models required by existing methods. 3. The covariance transmission overhead is only
1. While the mutual information objective is well-motivated, the paper lacks formal convergence guarantees or theoretical bounds on the robustness to label noise. 2. The method introduces several hyperparameters (ε², α, d, ηc) that require tuning. 3. The approach requires estimating and storing J×d×d covariance matrices. For problems with many classes (J>>100) or high feature dimensions, this could become prohibitive despite the claimed efficiency. 4. Only symmetric noise is considered; asymmet
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Privacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning
