Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning

Amine Barrak

arXiv:2604.22072·cs.DC·April 27, 2026

Shard the Gradient, Scale the Model: Serverless Federated Aggregation via Gradient Partitioning

Amine Barrak

PDF

TL;DR

This paper introduces GradsSharding, a gradient partitioning method for serverless federated learning that enables scalable aggregation of large models by dividing gradients into shards, maintaining accuracy and reducing costs.

Contribution

The paper proposes a novel gradient sharding approach that overcomes memory limitations in serverless federated learning, allowing for larger models without accuracy loss.

Findings

01

GradsSharding achieves up to 2.7x cost reduction on VGG-16.

02

It enables aggregation of models up to 5 GB in size.

03

The approach remains deployable beyond existing serverless memory limits.

Abstract

Federated learning (FL) aggregation on serverless platforms faces a hard scalability ceiling: existing architectures (lambda-FL, LIFL) partition clients across aggregators, but every aggregator must hold the complete model gradient in memory. When gradients exceed the per-function memory limit (e.g., 10 GB on AWS Lambda), aggregation becomes infeasible regardless of tree depth or branching factor. We propose GradsSharding, which instead partitions the gradient tensor into M shards, each averaged independently by a serverless function that receives contributions from all clients. Because FedAvg averaging is element-wise, this produces bit-identical results to tree-based approaches, so model accuracy is invariant by construction. Per-function memory is bounded at O(|{\theta}|/M), independent of client count, enabling aggregation of arbitrarily large models. We evaluate GradsSharding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.