Scalable Dual Coordinate Descent for Kernel Methods
Zishan Shao, Aditya Devarakonda

TL;DR
This paper introduces $s$-step variants of dual coordinate descent methods for kernel SVMs and ridge regression, significantly reducing communication costs and achieving near-linear speedups on large-scale distributed systems.
Contribution
The paper develops and analyzes $s$-step variants of DCD and BDCD methods that reduce communication frequency, improving scalability for kernel methods on distributed hardware.
Findings
Achieved up to 9.8x speedup with 512 cores.
Maintained numerical stability for large $s$ values.
Bounded computation and communication costs theoretically.
Abstract
Dual Coordinate Descent (DCD) and Block Dual Coordinate Descent (BDCD) are important iterative methods for solving convex optimization problems. In this work, we develop scalable DCD and BDCD methods for the kernel support vector machines (K-SVM) and kernel ridge regression (K-RR) problems. On distributed-memory parallel machines the scalability of these methods is limited by the need to communicate every iteration. On modern hardware where communication is orders of magnitude more expensive, the running time of the DCD and BDCD methods is dominated by communication cost. We address this communication bottleneck by deriving -step variants of DCD and BDCD for solving the K-SVM and K-RR problems, respectively. The -step variants reduce the frequency of communication by a tunable factor of at the expense of additional bandwidth and computation. The -step variants compute the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
