Fast training of large kernel models with delayed projections
Amirhesam Abedsoltan, Siyuan Ma, Parthe Pandit, Mikhail Belkin

TL;DR
This paper introduces EigenPro4, a scalable kernel machine training algorithm that employs delayed projections in PSGD, enabling efficient training of larger models and datasets with improved speed and comparable accuracy.
Contribution
The paper presents a novel delayed projection technique integrated into PSGD, significantly enhancing the scalability and speed of kernel machine training.
Findings
EigenPro4 achieves faster training times on multiple datasets.
The method maintains or improves classification accuracy.
It enables training of larger kernel models than previously possible.
Abstract
Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. We validate our algorithm, EigenPro4, across multiple datasets, demonstrating drastic training speed up over the existing methods while maintaining comparable or better classification accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Speech Recognition and Synthesis · Neural Networks and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
