SPARK: Igniting Communication-Efficient Decentralized Learning via Stage-wise Projected NTK and Accelerated Regularization
Li Xia

TL;DR
SPARK introduces a novel decentralized learning method that combines Jacobian compression, stage-wise distillation, and momentum acceleration to significantly reduce communication costs while maintaining high accuracy and fast convergence.
Contribution
The paper presents a new framework integrating random projection, annealed distillation, and Nesterov momentum to enhance communication efficiency and convergence in decentralized federated learning.
Findings
Achieves 98.7% reduction in communication compared to NTK-DFL.
Reaches target performance three times faster with momentum.
Establishes state-of-the-art results in communication-efficient decentralized learning.
Abstract
Decentralized federated learning (DFL) faces critical challenges from statistical heterogeneity and communication overhead. While NTK-based methods achieve faster convergence, transmitting full Jacobian matrices is impractical for bandwidth-constrained edge networks. We propose SPARK, synergistically integrating random projection-based Jacobian compression, stage-wise annealed distillation, and Nesterov momentum acceleration. Random projections compress Jacobians while preserving spectral properties essential for convergence. Stage-wise annealed distillation transitions from pure NTK evolution to neighbor-regularized learning, counteracting compression noise. Nesterov momentum accelerates convergence through stable accumulation enabled by distillation smoothing. SPARK achieves 98.7% communication reduction compared to NTK-DFL while maintaining convergence speed and superior accuracy.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
