Federated Learning with Profile Mapping under Distribution Shifts and Drifts
Mohan Li, Dario Fenoglio, Martin Gjoreski, Marc Langheinrich

TL;DR
This paper introduces Feroma, a federated learning framework that effectively manages distribution shifts and drifts across clients using privacy-preserving profile representations, enabling robust model aggregation and deployment without prior client data knowledge.
Contribution
Feroma is a novel FL method that utilizes distribution profiles for adaptive aggregation, handling both shifts and drifts without relying on client identities or prior data assumptions.
Findings
Feroma outperforms 10 state-of-the-art methods with up to 12% accuracy improvement.
It maintains computational and communication efficiency comparable to FedAvg.
Demonstrates robustness under dynamic data heterogeneity conditions.
Abstract
Federated Learning (FL) enables decentralized model training across clients without sharing raw data, but its performance degrades under real-world data heterogeneity. Existing methods often fail to address distribution shift across clients and distribution drift over time, or they rely on unrealistic assumptions such as known number of client clusters and data heterogeneity types, which limits their generalizability. We introduce Feroma, a novel FL framework that explicitly handles both distribution shift and drift without relying on client or cluster identity. Feroma builds on client distribution profiles-compact, privacy-preserving representations of local data-that guide model aggregation and test-time model assignment through adaptive similarity-based weighting. This design allows Feroma to dynamically select aggregation strategies during training, ranging from clustered to…
Peer Reviews
Decision·ICLR 2026 Poster
1. FEROMA introduces a differentially private distribution profile for each client. 2. The method address diverse FL scenarios in a single framework.
1. This paper lacks some necessary to compare and discuss in the related work and experiments, e.g., [R1-R3]. Recent alternative approaches for handling distribution shifts and drifts in FL are omitted [R4-R5]. 2. How would the DPE behave under significant violations of its underlying assumptions—e.g., highly multi-modal, long-tailed, or high-dimensional client feature spaces? Is there any empirical or theoretical work supporting extension of the current implementation to natural language or tim
+ This paper contains extensive experimental evaluation. The paper includes thorough comparisons with 10 strong baselines across six datasets (including real-world ones like CheXpert and Office-Home). Performance gains are consistent (e.g., +14.1pp on MNIST, +10.3pp on CIFAR-10, Table 1). Scalability experiments up to 100 clients and drift-frequency ablations are convincing. + .The paper is well-structured and visually clear, with informative figures (e.g., Figures 1–3) and theoretical justific
- Writing quality is high, though a few sections are dense and could benefit from intuitive explanations or examples (e.g., DPE stochasticity (R3)). The paper includes an extensive set of experimental results across multiple datasets, baselines, and drift/shift scenarios, which demonstrates strong empirical effort and thorough evaluation. However, the presentation could be improved for readability. Currently, the dense amount of numerical results and tables makes it difficult for readers to extr
The proposed method introduces a unified approach that can handle both distribution shift and distribution drift. It covers scenarios that previous FL methods treat separately. This generalization is meaningful and provides a comprehensive theoretical foundation for dynamic non-IID conditions. Further, the way that this paper designed Distribution- Profile Extractor (DPE) with explicit differential privacy guarantees, bounded stochasticity, and theoretical validation supports a strong methodolog
This reviewer has some concerns about weaknesses. The effectiveness of proposed method depends on the quality of the DPE. However, DPE itself requires a pretrained model. The manuscript even states “the effectiveness of FEROMA similarly depends on the quality of its DPE, which must generate reliable representations of client data distributions”. Poor representation quality can undermine profile fidelity and mapping accuracy, and ultimately make the method less reliable in low-resource scenario
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Stream Mining Techniques · Mobile Crowdsensing and Crowdsourcing
