Minimax Rates and Spectral Distillation for Tree Ensembles

Binh Duc Vu; David S. Watson

arXiv:2605.11841·stat.ML·May 13, 2026

Minimax Rates and Spectral Distillation for Tree Ensembles

Binh Duc Vu, David S. Watson

PDF

TL;DR

This paper provides a spectral analysis of tree ensembles like RFs and GBMs, deriving optimal convergence rates and developing compression methods that produce smaller, efficient models with maintained accuracy.

Contribution

It introduces a spectral perspective to analyze and compress tree ensembles, achieving minimax-optimal rates and effective model distillation techniques.

Findings

01

Eigenvalue decay governs statistical convergence rates.

02

Spectral methods enable significant model compression.

03

Distilled models retain competitive predictive performance.

Abstract

Tree ensembles such as random forests (RFs) and gradient boosting machines (GBMs) are among the most widely used supervised learners, yet their theoretical properties remain incompletely understood. We adopt a spectral perspective on these algorithms, with two main contributions. First, we derive minimax-optimal convergence for RF regression, showing that, under mild regularity conditions on tree growth, the eigenvalue decay of the induced kernel operator governs the statistical rate. Second, we exploit this spectral viewpoint to develop compression schemes for tree ensembles. For RFs, leading eigenfunctions of the kernel operator capture the dominant predictive directions; for GBMs, leading singular vectors of the smoother matrix play an analogous role. Learning nonlinear maps for these spectral representations yields distilled models that are orders of magnitude smaller than the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.