TL;DR
This paper introduces FEDMOSAIC, a personalized federated learning method that adaptively collaborates based on per-example trust, improving performance in heterogeneous data settings without sharing raw data.
Contribution
It proposes a novel trust-aware, fine-grained collaboration approach in federated learning, demonstrating its effectiveness through FEDMOSAIC and theoretical convergence guarantees.
Findings
FEDMOSAIC outperforms strong FL and PFL baselines in non-IID settings.
The method achieves better results than local and centralized training.
Convergence is proven under standard assumptions.
Abstract
Data heterogeneity poses a fundamental challenge in federated learning (FL), especially when clients differ not only in distribution but also in the reliability of their predictions across individual examples. While personalized FL (PFL) aims to address this, we observe that many PFL methods fail to outperform two necessary baselines, local training and centralized training. This suggests that meaningful personalization only emerges in a narrow regime, where global models are insufficient, but collaboration across clients still holds value. Our empirical findings point to two key ingredients for success in this regime: adaptivity in collaboration and fine-grained trust, at the level of individual examples. We show that these properties can be achieved within federated semi-supervised learning, where clients exchange predictions over a shared unlabeled dataset. This enables each client…
Peer Reviews
Decision·Submitted to ICLR 2026
S1:The proposed co-training approach is novel and offers numerous opportunities for further improvement. In this framework, neither local data nor local models are shared, which enhances privacy preservation and reduces communication costs with the server. S2: The main contribution of the authors is the introduction of a dynamic mechanism that combines local loss functions and adaptively updates the labels of samples in the global dataset U. This mechanism provides flexibility and allows the pro
W1. TFrom Algorithm 1, it appears that the authors assume all clients participate in the co-training process at every time step (or communication round). This assumption differs from that of conventional federated learning and may not be feasible in real-world applications. Furthermore, even under this assumption, the proposed mechanism faces scalability challenges. As the number of clients increases (e.g., hundreds or thousands) and/or the size of the global dataset U grows, the communication a
1. The writing is clear and easy to follow. 2. It provides theoretical analysis on both communication cost, convergence and privacy. 3. It achieves superior performance compared to many other algorithms using standard datasets with strong skewness.
1. The confidence scores are not explicitly explained in the paper. L205 discusses two different methods to calculate the confidence scores. However, they are not explained in detail anywhere. Only their performances are shown in the experiments. This should be addressed before it is ready for publication. 2. The algorithm relies on the availability of a public unlabelled dataset, which may not exist in practice. Moreover, the FL baselines in the experiments are not utilizing the public data, w
The research problem is relevant. Deriving algorithms that allow collaborative training agnostic to the local training algorithms, while improving efficiency by avoiding expensive model sharing is quite interesting. The empirical findings of the paper are quite promising, showcasing that FedMosaic outperforms other personalized federated learning baselines in different heterogeneity scenarios. FedMosaic is also shown to consistently outperform local learning and global federated learning.
- The bounded objective drift assumptions is not standard in the distributed optimization literature. The authors do provide an explanation of why they expect this assumption to hold in practice (line 300), however, no guarantee on $\delta$ is provided. Given that this term appears additively in the convergence result, this is a key limitation that needs to be addressed. - The optimization results involves upper bounding $||\nabla L_t^i (\theta_t)||^2 $, but this is not really a relevant quant
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
