The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics

Fabio Turazza; Marco Picone; Marco Mamei

arXiv:2602.01186·cs.LG·February 3, 2026

The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics

Fabio Turazza, Marco Picone, Marco Mamei

PDF

Open Access 3 Reviews

TL;DR

This paper introduces the Gaussian-Head OFL family, a set of one-shot federated learning methods that use client statistics to build effective models without multiple communication rounds or additional data, achieving state-of-the-art results.

Contribution

The paper proposes the Gaussian-Head OFL family, enabling practical one-shot federated learning with minimal communication and no data sharing, based on class-conditional Gaussian assumptions.

Findings

01

Achieves state-of-the-art robustness under non-IID data

02

Operates without data sharing or multiple communication rounds

03

Provides effective model construction from client statistics

Abstract

Classical Federated Learning relies on a multi-round iterative process of model exchange and aggregation between server and clients, with high communication costs and privacy risks from repeated model transmissions. In contrast, one-shot federated learning (OFL) alleviates these limitations by reducing communication to a single round, thereby lowering overhead and enhancing practical deployability. Nevertheless, most existing one-shot approaches remain either impractical or constrained, for example, they often depend on the availability of a public dataset, assume homogeneous client models, or require uploading additional data or model information. To overcome these issues, we introduce the Gaussian-Head OFL (GH-OFL) family, a suite of one-shot federated methods that assume class-conditional Gaussianity of pretrained embeddings. Clients transmit only sufficient statistics (per-class…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The proposed method is strictly data-free, requiring no public datasets or client-side inference, which is a major advantage for privacy. 2. The use of sufficient statistics and random projection sketches makes the approach highly communication-efficient. 3. The paper is generally well-written, with a clear narrative and a logical flow from problem to solution.

Weaknesses

1. It relies on the core assumption that the feature embeddings for each class follow a Gaussian distribution, which may not hold perfectly in practice. 2. The Fisher subspace projection, while beneficial for dimensionality reduction, may discard some discriminative information present in the discarded dimensions. 3. The paper does not sufficiently explore the limitations of the Gaussianity assumption for certain types of data.

Reviewer 02Rating 6Confidence 4

Strengths

+ The moment set is additively aggregable and sufficient to instantiate NB-diag/LDA/QDA; random-projection sketches reduce bandwidth and preserve additivity. The derivations for means/covariances and pooled covariance are explicit. + The paper motivates a Fisher subspace generalized eigen problem and uses data-free Gaussian sampling there to train FisherMix and Proto-Hyper, offering practical gains when closed-form heads are biased. + Results compare GH-OFL heads to OFL baselines across datase

Weaknesses

- The datasets used are all computer-vision datasets, which limits the scalability of the proposed approach. - Privacy discussion is not deep and the privacy analysis is qualitative. - The experiments in this paper are not extensive, i.e., the ablation study of FL (e.g., number of clients) is not well discussed.

Reviewer 03Rating 6Confidence 4

Strengths

- Comprehensive alternatives and corresponding clear discriptions are provided. - Ablation study on modules shows the importance and effects of each components.

Weaknesses

- Comparison of the computational overheads between data synthesis and training with public datasets is anlayses with details and empirical evidence. - Ablation study would better demonstrate the trade-off between communication and performance applying one-shot FL. - Baselines of general FL should be provided to general audience to support the advantage of one-shot FL. In Dirichlet of 0.5 cases, FedAvg reach a accuracy at least 86.02 on ResNet-18, which is not mentioned in the main table.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks