SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction   for Non-convex Cross-Device Federated Learning

Avetik Karagulyan; Egor Shulgin; Abdurakhmon Sadiev; Peter Richt\'arik

arXiv:2405.20127·math.OC·May 31, 2024·1 cites

SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning

Avetik Karagulyan, Egor Shulgin, Abdurakhmon Sadiev, Peter Richt\'arik

PDF

Open Access

TL;DR

The paper introduces SPAM, a novel federated learning algorithm that effectively handles client drift and data similarity issues in cross-device settings with non-convex losses, even under partial participation.

Contribution

It presents the first algorithm that does not require objective smoothness and leverages data similarity, with sharp analysis under Hessian similarity for non-convex federated learning.

Findings

01

Proves convergence benefits from data similarity.

02

Handles partial client participation effectively.

03

Applicable to non-convex loss functions without smoothness.

Abstract

Cross-device training is a crucial subfield of federated learning, where the number of clients can reach into the billions. Standard approaches and local methods are prone to issues such as client drift and insensitivity to data similarities. We propose a novel algorithm (SPAM) for cross-device federated learning with non-convex losses, which solves both issues. We provide sharp analysis under second-order (Hessian) similarity, a condition satisfied by a variety of machine learning problems in practice. Additionally, we extend our results to the partial participation setting, where a cohort of selected clients communicate with the server at each communication round. Our method is the first in its kind, that does not require the smoothness of the objective and provably benefits from clients having similar data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Random Matrices and Applications