Doubly stochastic large scale kernel learning with the empirical kernel map
Nikolaas Steenbergen, Sebastian Schelter, Felix Bie{\ss}mann

TL;DR
This paper introduces a scalable kernel learning method using doubly stochastic optimization of the empirical kernel map, enabling effective use of full kernel functions on large datasets without approximations.
Contribution
It presents a simple, parallelizable algorithm that scales kernel methods to large datasets by optimizing the empirical kernel map directly, avoiding kernel matrix approximations.
Findings
Works efficiently on large datasets
Leverages full kernel functions without approximations
Easily implementable in parallel computing environments
Abstract
With the rise of big data sets, the popularity of kernel methods declined and neural networks took over again. The main problem with kernel methods is that the kernel matrix grows quadratically with the number of data points. Most attempts to scale up kernel methods solve this problem by discarding data points or basis functions of some approximation of the kernel map. Here we present a simple yet effective alternative for scaling up kernel methods that takes into account the entire data set via doubly stochastic optimization of the emprical kernel map. The algorithm is straightforward to implement, in particular in parallel execution settings; it leverages the full power and versatility of classical kernel functions without the need to explicitly formulate a kernel map approximation. We provide empirical evidence that the algorithm works on large data sets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
