Bayesian Coreset Optimization for Personalized Federated Learning

Prateek Chanda; Shrey Modi; Ganesh Ramakrishnan

arXiv:2511.01800·cs.LG·November 4, 2025

Bayesian Coreset Optimization for Personalized Federated Learning

Prateek Chanda, Shrey Modi, Ganesh Ramakrishnan

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces a personalized coreset weighted federated learning method that reduces communication costs and improves generalization by selecting representative data points for each client, with theoretical guarantees and empirical validation.

Contribution

Proposes $ ext{methodprop}$, a novel personalized coreset approach for federated learning, with theoretical analysis and demonstrated improvements over existing sampling and subset selection methods.

Findings

01

Theoretical bounds show minimax optimal generalization error up to logarithmic factors.

02

Significant empirical gains on benchmark datasets with various federated architectures.

03

Improved performance on medical datasets compared to submodular subset selection methods.

Abstract

In a distributed machine learning setting like Federated Learning where there are multiple clients involved which update their individual weights to a single central server, often training on the entire individual client's dataset for each client becomes cumbersome. To address this issue we propose $\methodprop$ : a personalized coreset weighted federated learning setup where the training updates for each individual clients are forwarded to the central server based on only individual client coreset based representative data points instead of the entire client data. Through theoretical analysis we present how the average generalization error is minimax optimal up to logarithm bounds (upper bounded by $O (n_{k}^{- \frac{2 β}{2 β + Λ}} lo g^{2 δ^{'}} (n_{k}))$ ) and lower bounds of $O (n_{k}^{- \frac{2 β}{2 β + Λ}})$ , and…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

1. The paper is, for the most part, well written. There is not much work in terms of coresets for federated learning and as such the paper will be of interest to the community. 2. The authors have compared their method with a variety of baselines consisting of both - federated learning algorithms and also sampling strategies that incorporate diversity. Their method performs well in most of the cases. 3. The algorithm is backed with theoretical guarantees. I did not check the proofs, but the sta

Weaknesses

1. I am not sure what is the challenge in incorporating the Bayesian coreset framework in federated learning setting. It would be better to explain clearly why this is a significant contribution. Both the algorithm and proof techniques appear to be heavily inspired from Zhang 2022b. The only modification seems to be use of Bayesian coresets. 2. There are minor grammatical errors. Please do a grammar check.

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- The integration of Bayesian coresets with federated learning is innovative. - In the context of personalized federated learning, this work presents new ideas and considerations for defining the objective in coreset computation, which differs from the commonly used coreset definition.

Weaknesses

- The paper's content is a bit bloated, and the use of notations can be messy. For instance, sections 3.2 and 4 could be condensed to make them more concise. Additionally, there is potential to simplify the formulaic aspect. - It would be beneficial if the author could emphasize their novel contribution, distinguishing it from the techniques previously proposed by others. Currently, these ideas seem to be mixed within the intricate details of the interpretations. - The overall architecture, as w

Reviewer 03Rating 8· accept, good paperConfidence 2

Strengths

1. The idea of incorporating coreset optimization in FL is new and well-motivated. 2. Solid theoretical results are given. 3. Some optimistic empirical studies are presented.

Weaknesses

1. The major weakness is the lack of convergence comparison in the empirical part. One of the major concerns in FL is the communication cost. Thus the number of iteration rounds is crucial in FL. The reviewer suggests not only including the comparison of the final accuracy under (maybe different levels, not only 50%) of sample complexity, but also including the convergence speed, i.e., the communication cost comparison. 2. How expensive it is to calculate the coreset samples/weights? Is there a

Videos

Bayesian Coreset Optimization for Personalized Federated Learning· slideslive

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning