Scalable Kernel Methods via Doubly Stochastic Gradients

Bo Dai; Bo Xie; Niao He; Yingyu Liang; Anant Raj; Maria-Florina; Balcan; Le Song

arXiv:1407.5599·cs.LG·September 11, 2015·74 cites

Scalable Kernel Methods via Doubly Stochastic Gradients

Bo Dai, Bo Xie, Niao He, Yingyu Liang, Anant Raj, Maria-Florina, Balcan, Le Song

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a scalable kernel method using doubly stochastic gradients, enabling kernel techniques to handle large datasets efficiently and compete with neural networks in various large-scale tasks.

Contribution

The authors propose a novel doubly stochastic gradient approach that scales kernel methods to large datasets, avoiding support vector storage and reducing memory requirements.

Findings

01

Achieves convergence rate of O(1/t) to the optimal function.

02

Attains generalization performance of O(1/√t).

03

Performs competitively with neural networks on large datasets.

Abstract

The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after $t$ iterations converges to the optimal function in the reproducing kernel Hilbert space in rate $O (1/ t)$ , and achieves a generalization performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zixu1986/Doubly_Stochastic_Gradients
none

Videos

Scalable Kernel Methods via Doubly Stochastic Gradients· youtube

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques