The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics
Fabio Turazza, Marco Picone, Marco Mamei

TL;DR
This paper introduces the Gaussian-Head OFL family, a set of one-shot federated learning methods that use client statistics to build effective models without multiple communication rounds or additional data, achieving state-of-the-art results.
Contribution
The paper proposes the Gaussian-Head OFL family, enabling practical one-shot federated learning with minimal communication and no data sharing, based on class-conditional Gaussian assumptions.
Findings
Achieves state-of-the-art robustness under non-IID data
Operates without data sharing or multiple communication rounds
Provides effective model construction from client statistics
Abstract
Classical Federated Learning relies on a multi-round iterative process of model exchange and aggregation between server and clients, with high communication costs and privacy risks from repeated model transmissions. In contrast, one-shot federated learning (OFL) alleviates these limitations by reducing communication to a single round, thereby lowering overhead and enhancing practical deployability. Nevertheless, most existing one-shot approaches remain either impractical or constrained, for example, they often depend on the availability of a public dataset, assume homogeneous client models, or require uploading additional data or model information. To overcome these issues, we introduce the Gaussian-Head OFL (GH-OFL) family, a suite of one-shot federated methods that assume class-conditional Gaussianity of pretrained embeddings. Clients transmit only sufficient statistics (per-class…
Peer Reviews
Decision·ICLR 2026 Poster
1. The proposed method is strictly data-free, requiring no public datasets or client-side inference, which is a major advantage for privacy. 2. The use of sufficient statistics and random projection sketches makes the approach highly communication-efficient. 3. The paper is generally well-written, with a clear narrative and a logical flow from problem to solution.
1. It relies on the core assumption that the feature embeddings for each class follow a Gaussian distribution, which may not hold perfectly in practice. 2. The Fisher subspace projection, while beneficial for dimensionality reduction, may discard some discriminative information present in the discarded dimensions. 3. The paper does not sufficiently explore the limitations of the Gaussianity assumption for certain types of data.
+ The moment set is additively aggregable and sufficient to instantiate NB-diag/LDA/QDA; random-projection sketches reduce bandwidth and preserve additivity. The derivations for means/covariances and pooled covariance are explicit. + The paper motivates a Fisher subspace generalized eigen problem and uses data-free Gaussian sampling there to train FisherMix and Proto-Hyper, offering practical gains when closed-form heads are biased. + Results compare GH-OFL heads to OFL baselines across datase
- The datasets used are all computer-vision datasets, which limits the scalability of the proposed approach. - Privacy discussion is not deep and the privacy analysis is qualitative. - The experiments in this paper are not extensive, i.e., the ablation study of FL (e.g., number of clients) is not well discussed.
- Comprehensive alternatives and corresponding clear discriptions are provided. - Ablation study on modules shows the importance and effects of each components.
- Comparison of the computational overheads between data synthesis and training with public datasets is anlayses with details and empirical evidence. - Ablation study would better demonstrate the trade-off between communication and performance applying one-shot FL. - Baselines of general FL should be provided to general audience to support the advantage of one-shot FL. In Dirichlet of 0.5 cases, FedAvg reach a accuracy at least 86.02 on ResNet-18, which is not mentioned in the main table.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
