Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models

Ziru Niu; Hai Dong; A.K. Qin

arXiv:2508.01669·cs.LG·February 16, 2026

Bridging Generalization Gap of Heterogeneous Federated Clients Using Generative Models

Ziru Niu, Hai Dong, A.K. Qin

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel federated learning framework that improves generalization across heterogeneous clients by sharing feature statistics and generating synthetic data, bypassing the need for parameter aggregation.

Contribution

It proposes a model-heterogeneous FL approach using shared feature distributions and generative models to enhance generalization without parameter sharing.

Findings

01

Achieves higher generalization accuracy than existing methods.

02

Reduces communication costs and memory usage.

03

Effective for clients with different model architectures.

Abstract

Federated Learning (FL) is a privacy-preserving machine learning framework facilitating collaborative training across distributed clients. However, its performance is often compromised by data heterogeneity among participants, which can result in local models with limited generalization capability. Traditional model-homogeneous approaches address this issue primarily by regularizing local training procedures or dynamically adjusting client weights during aggregation. Nevertheless, these methods become unsuitable in scenarios involving clients with heterogeneous model architectures. In this paper, we propose a model-heterogeneous FL framework that enhances clients' generalization performance on unseen data without relying on parameter aggregation. Instead of model parameters, clients share feature distribution statistics (mean and covariance) with the server. Then each client trains a…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

S1. The paper is clear to understand and follow S2. The approach of fine-tuning the local models of clients using synthetic data is not new but the method using to generate the synthetic data is novel. S3. Significant improvement in average accuracy among clients are shown.

Weaknesses

W1. Computation and Communication Overhead: The proposed method requires the server to train and distribute the VTC model to clients. In addition, clients must generate synthetic data locally. These steps introduce non-trivial computational overhead compared to standard federated learning. Please discuss the communication and computation requirements of the proposed method and compare them with those of other existing approaches. W2. Privacy Preservation: Although the authors discuss privacy pr

Reviewer 02Rating 4Confidence 3

Strengths

1. The paper proposes using FedVTC to generate synthetic data, which can be used to fine-tune local models and improve their generalization ability, thereby eliminating the reliance on public datasets. 2. FedVTC avoids exposure of raw data by transmitting prototypes, covariances, and VTC models, thus preventing additional privacy risks. 3. A new objective function is designed for training the VTC model, which includes the standard negative ELBO loss and a distribution matching (DM) loss, regul

Weaknesses

1. Some symbols in the formulas are not defined, such as the symbol $V_i$ in Formula 5, which lacks a definition. 2. The proposed FedVTC framework does not introduce an entirely new solution or framework, but rather builds upon existing methods, which results in a somewhat limited level of innovation. 3. The comparative experiments seem to only compare with works published before 2025, without including comparisons with excellent works from 2025. 4. The flowchart in the paper is somewhat sim

Reviewer 03Rating 6Confidence 3

Strengths

The use of variational transposed convolution to produce synthetic samples for fine-tuning local models is a distinctive approach, avoiding dependency on public datasets or parameter aggregation. The method is evaluated on four benchmark datasets under varying non-IID settings (Dir(0.1) and Dir(1.0)), demonstrating superior generalization accuracy over five state-of-the-art baselines. FedVTC achieves lower communication costs than FedProto, FedTGP, FedGen, CCVR, and FedType and maintains c

Weaknesses

Although FedVTC is compared to five baselines, the discussion overlooks potential overlaps or distinctions with contemporary approaches such as those using diffusion models or zero-shot learning. Generalization performance is measured solely by accuracy; additional metrics like per-class precision/recall or robustness under extreme non-IID conditions would strengthen the claims. The requirement for homogeneous VTC models for aggregation may limit applicability in fully heterogeneous setti

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning