Robust Training of Federated Models with Extremely Label Deficiency
Yonggang Zhang, Zhiqin Yang, Xinmei Tian, Nannan Wang, Tongliang Liu,, Bo Han

TL;DR
This paper introduces Twin-sight, a twin-model paradigm for federated semi-supervised learning that mitigates gradient conflicts and improves model performance under label deficiency.
Contribution
It proposes a novel twin-model approach with a neighborhood-preserving constraint to enhance mutual guidance in federated semi-supervised learning.
Findings
Twin-sight significantly outperforms state-of-the-art methods.
The neighborhood-preserving constraint improves model synergy.
Experimental results validate the effectiveness of Twin-sight.
Abstract
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient conflicts. To alleviate gradient conflict, we propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. In particular, Twin-sight concurrently trains a supervised model with a supervised objective function while training an unsupervised model using an unsupervised objective function. To enhance the synergy between these two models, Twin-sight introduces a neighbourhood-preserving…
Peer Reviews
Decision·ICLR 2024 poster
The authors explore a novel approach for leveraging unlabeled data in the federated setting and demonstrate improved model performance as a result. The range of baselines and data sets is good. Overall the paper is well structured and well written.
What are gradient conflicts? In vanilla supervised training gradients are averaged over all data. Certainly there are some data points whose gradients differ in their direction. Is this a significant cause for concern? Has it been studied elsewhere? Naturally, the dissimilarity between gradients of different data points will reach an extreme as the loss nears a local minimum. Yet, to my knowledge, this has not concerned anyone. The neighborhood relation is a foundational technical element of t
- The paper considers a practically important problem and proposes a useful and principled solution to it. - The paper is overal well-written and easy to follow. It is well organized and the presentation is clear. - The idea is straightforward and well-motivated, and has a certain degree of originality and significance. - Extended empirical study is conducted.
- The motivation needs to be further explained. - Since "client drift" has been observed before, how "gradient cliff" is different from it, as the first contribution is claiming the phenonmenon? - When exactly will "gradient cliff" happen? Does it happen all the time when training traditional FSSL methods? - Since "The twin-model paradigm naturally avoids the issue of gradient conflict.", it would be nice to show how gradient similarities are improved by Twin-sight.
1. This paper proposes to train both supervised and unsupervised models on the client side to avoid gradient conflict caused by different objective functions when aggregating models on the server. 2. This paper introduces a neighborhood-preserving constraint to enable the supervised model and unsupervised model to fully interact with effective insights, collaborate and mutually benefit from each other’s strengths. 3.This paper conducts comprehensive comparison experiments and ablation experiment
1. This paper combines existing techniques to solve the federal semi-supervision problem, but lacks the necessary references and explanations. For example, the unsupervised algorithm adopted in formula (6) is not explained. f(w_u;∙) as an unsupervised model, what are its specific working principles and objectives? What does sim(∙) mean? What kind of neighborhood relation construction function is N(∙) of formula (9), and how does it construct neighborhood relation? 2. The experimental results of
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Privacy-Preserving Technologies in Data · Machine Learning and Algorithms
MethodsFocus
