Fast training of large kernel models with delayed projections

Amirhesam Abedsoltan; Siyuan Ma; Parthe Pandit; Mikhail Belkin

arXiv:2411.16658·stat.ML·November 26, 2024

Fast training of large kernel models with delayed projections

Amirhesam Abedsoltan, Siyuan Ma, Parthe Pandit, Mikhail Belkin

PDF

Open Access

TL;DR

This paper introduces EigenPro4, a scalable kernel machine training algorithm that employs delayed projections in PSGD, enabling efficient training of larger models and datasets with improved speed and comparable accuracy.

Contribution

The paper presents a novel delayed projection technique integrated into PSGD, significantly enhancing the scalability and speed of kernel machine training.

Findings

01

EigenPro4 achieves faster training times on multiple datasets.

02

The method maintains or improves classification accuracy.

03

It enables training of larger kernel models than previously possible.

Abstract

Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes--a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible, pushing the practical limits of kernel-based learning. We validate our algorithm, EigenPro4, across multiple datasets, demonstrating drastic training speed up over the existing methods while maintaining comparable or better classification accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Speech Recognition and Synthesis · Neural Networks and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings