Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
Hao Yan, Yuhong Guo

TL;DR
This paper introduces a lightweight unsupervised federated learning method that leverages pretrained vision-language models like CLIP to improve model performance on unlabeled data across clients, reducing training complexity and communication costs.
Contribution
The paper proposes a novel unsupervised federated learning approach using pretrained CLIP models, incorporating self-training and class-balanced feature sampling to handle data heterogeneity.
Findings
Significantly improves model accuracy over CLIP zero-shot predictions.
Outperforms supervised federated learning benchmarks under limited resources.
Reduces computational and communication overhead in federated settings.
Abstract
Federated learning aims to tackle the ``isolated data island" problem, where it trains a collective model from physically isolated clients while safeguarding the privacy of users' data. However, supervised federated learning necessitates that each client labels their data for training, which can be both time-consuming and resource-intensive, and may even be impractical for edge devices. Moreover, the training and transmission of deep models present challenges to the computation and communication capabilities of the clients. To address these two inherent challenges in supervised federated learning, we propose a novel lightweight unsupervised federated learning approach that leverages unlabeled data on each client to perform lightweight model training and communication by harnessing pretrained vision-language models, such as CLIP. By capitalizing on the zero-shot prediction capability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
MethodsContrastive Language-Image Pre-training
