Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models
Ziru Niu, Hai Dong, A.K. Qin

TL;DR
This paper introduces a novel federated learning framework that improves generalization across heterogeneous clients by sharing feature statistics and generating synthetic data, bypassing the need for parameter aggregation.
Contribution
It proposes a model-heterogeneous FL approach using shared feature distributions and generative models to enhance generalization without parameter sharing.
Findings
Achieves higher generalization accuracy than existing methods.
Reduces communication costs and memory usage.
Effective for clients with different model architectures.
Abstract
Federated Learning (FL) is a privacy-preserving machine learning framework facilitating collaborative training across distributed clients. However, its performance is often compromised by data heterogeneity among participants, which can result in local models with limited generalization capability. Traditional model-homogeneous approaches address this issue primarily by regularizing local training procedures or dynamically adjusting client weights during aggregation. Nevertheless, these methods become unsuitable in scenarios involving clients with heterogeneous model architectures. In this paper, we propose a model-heterogeneous FL framework that enhances clients' generalization performance on unseen data without relying on parameter aggregation. Instead of model parameters, clients share feature distribution statistics (mean and covariance) with the server. Then each client trains a…
Peer Reviews
Decision·ICLR 2026 Poster
S1. The paper is clear to understand and follow S2. The approach of fine-tuning the local models of clients using synthetic data is not new but the method using to generate the synthetic data is novel. S3. Significant improvement in average accuracy among clients are shown.
W1. Computation and Communication Overhead: The proposed method requires the server to train and distribute the VTC model to clients. In addition, clients must generate synthetic data locally. These steps introduce non-trivial computational overhead compared to standard federated learning. Please discuss the communication and computation requirements of the proposed method and compare them with those of other existing approaches. W2. Privacy Preservation: Although the authors discuss privacy pr
1. The paper proposes using FedVTC to generate synthetic data, which can be used to fine-tune local models and improve their generalization ability, thereby eliminating the reliance on public datasets. 2. FedVTC avoids exposure of raw data by transmitting prototypes, covariances, and VTC models, thus preventing additional privacy risks. 3. A new objective function is designed for training the VTC model, which includes the standard negative ELBO loss and a distribution matching (DM) loss, regul
1. Some symbols in the formulas are not defined, such as the symbol $V_i$ in Formula 5, which lacks a definition. 2. The proposed FedVTC framework does not introduce an entirely new solution or framework, but rather builds upon existing methods, which results in a somewhat limited level of innovation. 3. The comparative experiments seem to only compare with works published before 2025, without including comparisons with excellent works from 2025. 4. The flowchart in the paper is somewhat sim
The use of variational transposed convolution to produce synthetic samples for fine-tuning local models is a distinctive approach, avoiding dependency on public datasets or parameter aggregation. The method is evaluated on four benchmark datasets under varying non-IID settings (Dir(0.1) and Dir(1.0)), demonstrating superior generalization accuracy over five state-of-the-art baselines. FedVTC achieves lower communication costs than FedProto, FedTGP, FedGen, CCVR, and FedType and maintains c
Although FedVTC is compared to five baselines, the discussion overlooks potential overlaps or distinctions with contemporary approaches such as those using diffusion models or zero-shot learning. Generalization performance is measured solely by accuracy; additional metrics like per-class precision/recall or robustness under extreme non-IID conditions would strengthen the claims. The requirement for homogeneous VTC models for aggregation may limit applicability in fully heterogeneous setti
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning
