The Random Forest Kernel and other kernels for big data from random partitions
Alex Davies, Zoubin Ghahramani

TL;DR
This paper introduces Random Partition Kernels, including the Random Forest Kernel and Fast Cluster Kernel, which leverage random partitions for improved performance on real-world big data tasks and enable efficient large-scale inference.
Contribution
It proposes a novel class of kernels derived from random partitions, notably using Random Forests, and demonstrates their effectiveness and scalability for big data applications.
Findings
Random Forest Kernel outperforms standard kernels on real-world datasets.
The kernels enable $O(N)$ inference in Gaussian Processes, SVMs, and Kernel PCA.
The method provides a natural approximation suitable for large-scale data analysis.
Abstract
We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models · Anomaly Detection Techniques and Applications
MethodsPrincipal Components Analysis
