Efficient Distribution Similarity Identification in Clustered Federated Learning via Principal Angles Between Client Data Subspaces
Saeed Vahidian, Mahdi Morafah, Weijia Wang, Vyacheslav Kungurtsev,, Chen Chen, Mubarak Shah, and Bill Lin

TL;DR
This paper introduces a novel clustered federated learning method that efficiently identifies client data distribution similarities by analyzing principal angles between client data subspaces, enabling faster and more effective clustering.
Contribution
The paper proposes a direct, single-shot approach using principal angles and SVD to identify client data similarities, improving clustering efficiency in federated learning.
Findings
Enables rapid clustering based on data distribution similarities.
Provides convergence guarantees for non-convex federated learning objectives.
Addresses broad data heterogeneity issues beyond label skew.
Abstract
Clustered federated learning (FL) has been shown to produce promising results by grouping clients into clusters. This is especially effective in scenarios where separate groups of clients have significant differences in the distributions of their local data. Existing clustered FL algorithms are essentially trying to group together clients with similar distributions so that clients in the same cluster can leverage each other's data to better perform federated learning. However, prior clustered FL algorithms attempt to learn these distribution similarities indirectly during training, which can be quite time consuming as many rounds of federated learning may be required until the formation of clusters is stabilized. In this paper, we propose a new approach to federated learning that directly aims to efficiently identify distribution similarities among clients by analyzing the principal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
