Revisiting Forest Proximities via Sparse Leaf-Incidence Kernels
Adrien Aumon, Guy Wolf, Kevin R. Moon, Jake S. Rhodes

TL;DR
This paper introduces a unified framework for forest proximities using sparse leaf-incidence kernels, enabling scalable, near-linear computation and efficient embeddings in kernel and representation learning.
Contribution
It presents a novel class of SWLC kernels that unify existing proximities, providing an explicit sparse leaf-space representation and an exact, scalable kernel computation method.
Findings
Kernel computation scales near-linearly with data size.
The sparse leaf-space representation enables fast task-aware embeddings.
Empirical benchmarks confirm the theoretical scalability and efficiency.
Abstract
Decision forests induce supervised similarities through the partition structure of their trees. Yet forest proximity computation is still often treated as a quadratic operation in the number of samples, which limits scalability and restricts broader use in kernel and representation-learning pipelines. We introduce a unified view of leaf-collision forest proximities through a class of Separable Weighted Leaf-Collision (SWLC) kernels, showing that most existing proximities differ only in their weighting scheme while sharing a common sparse leaf-incidence structure. This yields an explicit leaf-space representation that clarifies their kernel interpretation and leads to an exact finite-sample sparse factorization of the proximity matrix, avoiding an explicit all-pairs comparison and reducing computation to sparse linear algebra over leaf collisions. We implement this framework in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
