Accelerated Methods with Complexity Separation Under Data Similarity for Federated Learning Problems

Dmitry Bylinkin; Sergey Skorik; Dmitriy Bystrov; Leonid Berezin; Aram Avetisyan; Aleksandr Beznosikov

arXiv:2601.08614·math.OC·January 14, 2026

Accelerated Methods with Complexity Separation Under Data Similarity for Federated Learning Problems

Dmitry Bylinkin, Sergey Skorik, Dmitriy Bystrov, Leonid Berezin, Aram Avetisyan, Aleksandr Beznosikov

PDF

Open Access 3 Reviews

TL;DR

This paper introduces new federated learning algorithms that address data heterogeneity by leveraging data similarity, resulting in more communication-efficient solutions validated through theoretical analysis and experiments.

Contribution

It formalizes data heterogeneity as an optimization problem and develops novel algorithms with optimality guarantees for convex cases, enhancing federated learning efficiency.

Findings

01

Proposed algorithms outperform existing methods in communication efficiency.

02

Validated theoretical guarantees through extensive experiments.

03

Demonstrated effectiveness across various federated learning tasks.

Abstract

Heterogeneity within data distribution poses a challenge in many modern federated learning tasks. We formalize it as an optimization problem involving a computationally heavy composite under data similarity. By employing different sets of assumptions, we present several approaches to develop communication-efficient methods. An optimal algorithm is proposed for the convex case. The constructed theory is validated through a series of experiments across various problems.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 2

Strengths

1. The paper solves a quite relevant and more realistic problem in federated learning setups when considering real data. 1. The paper is developed rigorously and is well-positioned in the existing state-of-the-art. 1. The paper splits up the optimization problem into two functions corresponding to frequent and rare data, and uses the Hessian similarity of the two functions to develop communication-efficient methods. 1. The paper also derives the complexities of the methods in terms of communi

Weaknesses

1. The paper states (line 162) that the theory is validated through experiments on a "diverse set of tasks". This seems to be an overstatement, since the experiments are only conducted on two image classification tasks on MNIST and CIFAR-10. 1. The paper is quite challenging to read, with both prior and present results being described in the running text. I strongly encourage the authors to make the text more readable, as that will (probably) also increase the impact of their work. I find the i

Reviewer 02Rating 0Confidence 4

Strengths

The work extends prior literature on Hessian similarity and complexity separation to the setting where two separate data modes have different similarity conditions.

Weaknesses

1. The core contribution appears somewhat unjustified and seems like a convenient simplification. The authors propose decomposing the client functions into two composite parts representing “common” and “rare” data modes distributed over different client sets. However, a natural question arises: why restrict the heterogeneity to only two partitions? Since client heterogeneity is often modeled by continuous distributions (e.g., Dirichlet), segmenting into just two groups feels like a simplistic sh

Reviewer 03Rating 6Confidence 2

Strengths

1. The paper brings together the **composite structure** $h=f+g$ and and **Hessian similarity**, explicitly distinguishing $\delta_f$ and $\delta_g$ and using this split to guide sampling (and scheduling). This constitutes a natural yet underexplored extension of the “similarity + acceleration/variance-reduction” literature. 2. Starting from a strong-convexity baseline, it first provides an upper bound with separated complexities (Thm. 1), then applies variance-reduction ideas to tighten the ra

Weaknesses

1. The experiments are limited to MNIST-MLP and CIFAR-10-ResNet18, and primarily report the number of communication rounds on the $M_f$ side. The paper **does not report** (i) metrics on the $M_g$ side and (ii) total communication volume (e.g., counts of vector exchanges). It also lacks evaluations on more **realistic federated settings** (standard FL benchmarks). These omissions restrict the external validity and practical relevance of the claims. 2. The manuscript uses multiple, partially i

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security