SPARK: Igniting Communication-Efficient Decentralized Learning via Stage-wise Projected NTK and Accelerated Regularization

Li Xia

arXiv:2512.12737·cs.LG·December 16, 2025

SPARK: Igniting Communication-Efficient Decentralized Learning via Stage-wise Projected NTK and Accelerated Regularization

Li Xia

PDF

Open Access

TL;DR

SPARK introduces a novel decentralized learning method that combines Jacobian compression, stage-wise distillation, and momentum acceleration to significantly reduce communication costs while maintaining high accuracy and fast convergence.

Contribution

The paper presents a new framework integrating random projection, annealed distillation, and Nesterov momentum to enhance communication efficiency and convergence in decentralized federated learning.

Findings

01

Achieves 98.7% reduction in communication compared to NTK-DFL.

02

Reaches target performance three times faster with momentum.

03

Establishes state-of-the-art results in communication-efficient decentralized learning.

Abstract

Decentralized federated learning (DFL) faces critical challenges from statistical heterogeneity and communication overhead. While NTK-based methods achieve faster convergence, transmitting full Jacobian matrices is impractical for bandwidth-constrained edge networks. We propose SPARK, synergistically integrating random projection-based Jacobian compression, stage-wise annealed distillation, and Nesterov momentum acceleration. Random projections compress Jacobians while preserving spectral properties essential for convergence. Stage-wise annealed distillation transitions from pure NTK evolution to neighbor-regularized learning, counteracting compression noise. Nesterov momentum accelerates convergence through stable accumulation enabled by distillation smoothing. SPARK achieves 98.7% communication reduction compared to NTK-DFL while maintaining convergence speed and superior accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications