Federated Learning with Unlabeled Clients: Personalization Can Happen in Low Dimensions
Hossein Zakerinia, Jonathan Scott, Christoph H. Lampert

TL;DR
This paper introduces FLowDUP, a federated learning method enabling personalized models for unlabeled clients by operating in a low-dimensional space, supported by theoretical guarantees and extensive experiments.
Contribution
The paper proposes FLowDUP, a novel approach that creates personalized models using only unlabeled data in a low-dimensional subspace, with theoretical performance bounds.
Findings
Strong empirical performance across diverse datasets
Effective personalization with unlabeled data
Validated theoretical generalization guarantees
Abstract
Personalized federated learning has emerged as a popular approach to training on devices holding statistically heterogeneous data, known as clients. However, most existing approaches require a client to have labeled data for training or finetuning in order to obtain their own personalized model. In this paper we address this by proposing FLowDUP, a novel method that is able to generate a personalized model using only a forward pass with unlabeled data. The generated model parameters reside in a low-dimensional subspace, enabling efficient communication and computation. FLowDUP's learning objective is theoretically motivated by our new transductive multi-task PAC-Bayesian generalization bound, that provides performance guarantees for unlabeled clients. The objective is structured in such a way that it allows both clients with labeled data and clients with only unlabeled data to…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Addressing personalization for unlabeled clients is a relatively unexplored area in PFL. 2. The paper introduces a new transductive multi-task PAC-Bayesian bound that motivates the objective and gives its proof. 3. The low-dimensional subspace formulation has the potential to reduce communication and computation costs in federated settings.
1. The proposed method is built upon the foundations of regulation-based and hypernetwork-based personalized federated learning (PFL). However, the related work section does not provide an adequate review or a clear comparison between prior methods and the proposed approach. 2. The paper uses the hypernet for personalization which is not a new idea. Besides, the authors emphasize that the proposed method can handle federated learning with labeled data, yet there is no direct comparison with exis
1-FLowDUP directly addresses a well-identified pain point of PFL (unlabeled client incompatibility) with a non-trivial solution. Unlike prior hypernetwork-based methods (e.g., Amosy et al. 2024; Scott et al. 2024), it uses low-dimensional subspaces to enable on-device hypernetwork deployment and leverages unlabeled clients for training—two key innovations that distinguish it from related work. 2-It explicitly connects the regularizer \(\Omega\) and loss L to generalization performance, providin
1-FLowDUP fails when conditional distributions \(D_{i|Y|X}\) (label given input) differ across clients but marginal distributions \(D_{i|X}\) (input alone) are insufficient to infer predictive models—e.g., subjective tasks like product recommendations or sentiment analysis. The authors acknowledge this but provide no mitigation strategies, restricting the method’s real-world applicability. 2-The default \(k=10^4\) is chosen empirically, with no theoretical analysis of how k relates to dataset h
- The paper investigates an important problem in FL where clients without labels can often exist and not be able to gain anything out of federated learning. - The paper provides theoretical insights into their proposed algorithm's loss function regarding using labeled clients to benefit the unlabeled client's personalized models. - The paper is clearly written.
- FLowDUP relies largely on the labeled clients' data, and due to this I have a couple of concerns. First, if the labeled clients data does not match well of the unlabeled clients' data, which is often the case in federated learning due to data heterogeneity and cannot really be controlled since they are clients' private data, the performance I believe will largely suffer. Moreover, it is rather unfair for the labeled clients to provide their training resource/data for the benefit of training pe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
