Scalable Kernel Methods via Doubly Stochastic Gradients
Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina, Balcan, Le Song

TL;DR
This paper introduces a scalable kernel method using doubly stochastic gradients, enabling kernel techniques to handle large datasets efficiently and compete with neural networks in various large-scale tasks.
Contribution
The authors propose a novel doubly stochastic gradient approach that scales kernel methods to large datasets, avoiding support vector storage and reducing memory requirements.
Findings
Achieves convergence rate of O(1/t) to the optimal function.
Attains generalization performance of O(1/√t).
Performs competitively with neural networks on large datasets.
Abstract
The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after iterations converges to the optimal function in the reproducing kernel Hilbert space in rate , and achieves a generalization performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Scalable Kernel Methods via Doubly Stochastic Gradients· youtube
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
