On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees
Shahin Shahrampour, Soheil Kolouri

TL;DR
This paper investigates empirical leverage score-based sampling for random features in kernel approximation, providing theoretical guarantees and demonstrating improved performance over traditional methods through experiments.
Contribution
It introduces a practical approach for data-dependent sampling of random features using empirical leverage scores, with theoretical performance bounds and empirical validation.
Findings
Empirical leverage score sampling outperforms Monte Carlo sampling in experiments.
The method is competitive with supervised kernel learning without using label information.
Theoretical bounds reveal a trade-off between kernel approximation and eigenvalue decay.
Abstract
Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of features required to achieve optimal learning bounds. Leverage scores introduce an optimized distribution for features based on an infinite-dimensional integral operator (depending on input distribution), which is impractical to sample from. Focusing on empirical leverage scores in this paper, we establish an out-of-sample performance bound, revealing an interesting trade-off between the approximated kernel and the eigenvalue decay of another kernel in the domain of random features defined based on data distribution. Our experiments verify that the empirical algorithm consistently outperforms vanilla Monte Carlo sampling, and with a minor modification the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
