Toward Enhancing Representation Learning in Federated Multi-Task Settings
Mehdi Setayesh, Mahdi Beitollahi, Yasser H. Khalil, Hongliang Li

TL;DR
This paper introduces FedMuscle, a federated multi-task learning method that uses a novel contrastive loss to learn shared representations across diverse tasks and models, improving performance in heterogeneous settings.
Contribution
It proposes Muscle loss, a contrastive learning objective for federated multi-task learning that captures dependencies across tasks, enabling effective learning with model and task heterogeneity.
Findings
FedMuscle outperforms state-of-the-art baselines in diverse tasks.
The method effectively handles model and task heterogeneity.
Experiments show substantial improvements in performance.
Abstract
Federated multi-task learning (FMTL) seeks to collaboratively train customized models for users with different tasks while preserving data privacy. Most existing approaches assume model congruity (i.e., the use of fully or partially homogeneous models) across users, which limits their applicability in realistic settings. To overcome this limitation, we aim to learn a shared representation space across tasks rather than shared model parameters. To this end, we propose Muscle loss, a novel contrastive learning objective that simultaneously aligns representations from all participating models. Unlike existing multi-view or multi-model contrastive methods, which typically align models pairwise, Muscle loss can effectively capture dependencies across tasks because its minimization is equivalent to the maximization of mutual information among all the models' representations. Building on this…
Peer Reviews
Decision·ICLR 2026 Poster
The paper indicates a common limiting assumption among federated multitask learning approaches (model congruity) and proposes a new method (FedMuscle) to overcome this limitation. The paper justifies its method via theoretical results showing that their approach maximizes the mutual information among the models. The paper conducts experiments on both image (ViT, SegFormer) and text (BERT, DistilBERT) domains to justify the performance of FedMuscle compared to various baseline algorithms.
The requirement of a shared public dataset is very strong, particularly for federated learning scenarios. On page 4, lines 163 to 165, this issue is addressed suggesting using publicly available datasets or synthetic data samples but there are concerns of model collapse with synthetic data (though said concerns mostly focus on the recursively generated data by the model or model family itself reinforcing its own biases, and said issue is resolved via adding non-synthetic data which the local use
1. The authors provide a solid theoretical analysis of the proposed Muscle loss function. 2. The proposed FedMuscle algorithm can be applied to various tasks, including computer vision and natural language processing, demonstrating its capability to handle model and task heterogeneity in federated learning. 3. The contrastive Muscle loss can be seamlessly integrated into multimodal approaches.
From the perspective of federated learning theory analysis, the current work lacks proofs of convergence and generalization. Of course, this would be a substantial undertaking, and perhaps more in-depth theoretical analysis in this regard can be considered and refined in future research.
The paper covers a wide range of model structures and task types. The MI lower-bound perspective is nicely integrated with the contrastive framework. Covers both unimodal and multimodal configurations; the results are consistent. Clear improvement over strong baselines (FedHeNN, CoFED, CreamFL, and Muscle Loss).
The Muscle loss is simply a weighted multi-view InfoNCE variation; comparable theories exist, such as Gramian losses and multi-view MI maximization. Dependence on a shared public dataset undermines the privacy argument and limits usefulness in confined contexts. There are no theoretical assurances for FedMuscle's convergence or stability under customer heterogeneity. Weak ablations include no sensitivity analysis on critical hyperparameters (τ, M, B) or comparison to adaptive optimizers (FedA
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Face recognition and analysis
