All Random Features Representations are Equivalent

Luke Sernau; Silvano Bonacina; Rif A. Saurous

arXiv:2406.18802·cs.LG·October 25, 2024

All Random Features Representations are Equivalent

Luke Sernau, Silvano Bonacina, Rif A. Saurous

PDF

Open Access

TL;DR

This paper proves that all random feature representations are fundamentally equivalent in approximation error when sampled optimally, simplifying the choice of representations in kernel methods.

Contribution

It derives an optimal sampling policy that equalizes and minimizes approximation error across all random feature representations.

Findings

01

All random feature representations have the same minimal approximation error under optimal sampling.

02

The optimal sampling policy achieves the lowest possible approximation error.

03

Practitioners can select any representation with confidence, given they sample optimally.

Abstract

Random features are a powerful technique for rewriting positive-definite kernels as linear products. They bring linear tools to bear in important nonlinear domains like KNNs and attention. Unfortunately, practical implementations require approximating an expectation, usually via sampling. This has led to the development of increasingly elaborate representations with ever lower sample error. We resolve this arms race by deriving an optimal sampling policy. Under this policy all random features representations have the same approximation error, which we show is the lowest possible. This means that we are free to choose whatever representation we please, provided we sample optimally.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Neural Networks and Applications · Rough Sets and Fuzzy Logic