On the Nystrom Approximation for Preconditioning in Kernel Machines

Amirhesam Abedsoltan; Parthe Pandit; Luis Rademacher; Mikhail Belkin

arXiv:2312.03311·stat.ML·January 26, 2024·1 cites

On the Nystrom Approximation for Preconditioning in Kernel Machines

Amirhesam Abedsoltan, Parthe Pandit, Luis Rademacher, Mikhail Belkin

PDF

Open Access

TL;DR

This paper analyzes the use of Nystrom approximations for spectral preconditioning in kernel machine training, showing that logarithmic-sized samples can nearly match exact preconditioners in accelerating convergence while reducing costs.

Contribution

It provides a theoretical analysis of the trade-offs involved in using Nystrom-based approximated preconditioners for kernel methods, demonstrating their efficiency and effectiveness.

Findings

01

Logarithmic sample size suffices for effective preconditioning.

02

Nystrom approximation nearly matches exact preconditioner in accelerating gradient descent.

03

Significant reduction in computational and storage costs achieved.

Abstract

Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral preconditioner can be expensive which can lead to large computational and storage overheads, precluding the application of kernel methods to problems with large datasets. A Nystrom approximation of the spectral preconditioner is often cheaper to compute and store, and has demonstrated success in practical applications. In this paper we analyze the trade-offs of using such an approximated preconditioner. Specifically, we show that a sample of logarithmic size (as a function of the size of the dataset)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Machine Learning and Algorithms