On the Convergence and Stability of Distributed Sub-model Training

Yuyang Deng; Fuli Qiao; Mehrdad Mahdavi

arXiv:2511.06132·cs.LG·November 11, 2025

On the Convergence and Stability of Distributed Sub-model Training

Yuyang Deng, Fuli Qiao, Mehrdad Mahdavi

PDF

Open Access

TL;DR

This paper introduces a distributed shuffled sub-model training method for federated learning, demonstrating improved convergence and generalization stability through theoretical analysis and extensive experiments.

Contribution

It proposes a novel shuffled sub-model training approach with convergence guarantees and stability-based generalization improvements.

Findings

01

The algorithm converges at a proven rate.

02

Shuffling enhances training stability and generalization.

03

Experimental results validate theoretical claims.

Abstract

As learning models continue to grow in size, enabling on-device local training of these models has emerged as a critical challenge in federated learning. A popular solution is sub-model training, where the server only distributes randomly sampled sub-models to the edge clients, and clients only update these small models. However, those random sampling of sub-models may not give satisfying convergence performance. In this paper, observing the success of SGD with shuffling, we propose a distributed shuffled sub-model training, where the full model is partitioned into several sub-models in advance, and the server shuffles those sub-models, sends each of them to clients at each round, and by the end of local updating period, clients send back the updated sub-models, and server averages them. We establish the convergence rate of this algorithm. We also study the generalization of distributed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning