Robust Training of Federated Models with Extremely Label Deficiency

Yonggang Zhang; Zhiqin Yang; Xinmei Tian; Nannan Wang; Tongliang Liu,; Bo Han

arXiv:2402.14430·cs.LG·February 23, 2024·5 cites

Robust Training of Federated Models with Extremely Label Deficiency

Yonggang Zhang, Zhiqin Yang, Xinmei Tian, Nannan Wang, Tongliang Liu,, Bo Han

PDF

Open Access 2 Repos 3 Reviews

TL;DR

This paper introduces Twin-sight, a twin-model paradigm for federated semi-supervised learning that mitigates gradient conflicts and improves model performance under label deficiency.

Contribution

It proposes a novel twin-model approach with a neighborhood-preserving constraint to enhance mutual guidance in federated semi-supervised learning.

Findings

01

Twin-sight significantly outperforms state-of-the-art methods.

02

The neighborhood-preserving constraint improves model synergy.

03

Experimental results validate the effectiveness of Twin-sight.

Abstract

Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. Advanced FSSL methods predominantly focus on training a single model on each client. However, this approach could lead to a discrepancy between the objective functions of labeled and unlabeled data, resulting in gradient conflicts. To alleviate gradient conflict, we propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. In particular, Twin-sight concurrently trains a supervised model with a supervised objective function while training an unsupervised model using an unsupervised objective function. To enhance the synergy between these two models, Twin-sight introduces a neighbourhood-preserving…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

The authors explore a novel approach for leveraging unlabeled data in the federated setting and demonstrate improved model performance as a result. The range of baselines and data sets is good. Overall the paper is well structured and well written.

Weaknesses

What are gradient conflicts? In vanilla supervised training gradients are averaged over all data. Certainly there are some data points whose gradients differ in their direction. Is this a significant cause for concern? Has it been studied elsewhere? Naturally, the dissimilarity between gradients of different data points will reach an extreme as the loss nears a local minimum. Yet, to my knowledge, this has not concerned anyone. The neighborhood relation is a foundational technical element of t

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The paper considers a practically important problem and proposes a useful and principled solution to it. - The paper is overal well-written and easy to follow. It is well organized and the presentation is clear. - The idea is straightforward and well-motivated, and has a certain degree of originality and significance. - Extended empirical study is conducted.

Weaknesses

- The motivation needs to be further explained. - Since "client drift" has been observed before, how "gradient cliff" is different from it, as the first contribution is claiming the phenonmenon? - When exactly will "gradient cliff" happen? Does it happen all the time when training traditional FSSL methods? - Since "The twin-model paradigm naturally avoids the issue of gradient conflict.", it would be nice to show how gradient similarities are improved by Twin-sight.

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. This paper proposes to train both supervised and unsupervised models on the client side to avoid gradient conflict caused by different objective functions when aggregating models on the server. 2. This paper introduces a neighborhood-preserving constraint to enable the supervised model and unsupervised model to fully interact with effective insights, collaborate and mutually benefit from each other’s strengths. 3.This paper conducts comprehensive comparison experiments and ablation experiment

Weaknesses

1. This paper combines existing techniques to solve the federal semi-supervision problem, but lacks the necessary references and explanations. For example, the unsupervised algorithm adopted in formula (6) is not explained. f(w_u;∙) as an unsupervised model, what are its specific working principles and objectives? What does sim(∙) mean? What kind of neighborhood relation construction function is N(∙) of formula (9), and how does it construct neighborhood relation? 2. The experimental results of

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Privacy-Preserving Technologies in Data · Machine Learning and Algorithms

MethodsFocus