Empirical Likelihood for Random Forests and Ensembles

Harold D. Chiang; Yukitoshi Matsushita; Taisuke Otsu

arXiv:2511.13934·stat.ML·November 19, 2025

Empirical Likelihood for Random Forests and Ensembles

Harold D. Chiang, Yukitoshi Matsushita, Taisuke Otsu

PDF

Open Access

TL;DR

This paper introduces an empirical likelihood framework for random forests, enabling statistical uncertainty quantification with theoretical guarantees and practical adjustments for different subsampling regimes.

Contribution

It develops a novel EL-based approach for ensemble methods, including a modified EL to ensure accurate coverage under various subsampling conditions.

Findings

01

Modified EL achieves accurate coverage in simulations

02

The method is computationally efficient

03

The approach provides reliable uncertainty quantification

Abstract

We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$ -statistic structure inherent in ensemble predictions, we construct an EL statistic that is asymptotically chi-squared when subsampling induced by incompleteness is not overly sparse. Under sparser subsampling regimes, the EL statistic tends to over-cover due to loss of pivotality; we therefore propose a modified EL that restores pivotality through a simple adjustment. Our method retains key properties of EL while remaining computationally efficient. Theory for honest random forests and simulations demonstrate that modified EL achieves accurate coverage and practical reliability relative to existing inference methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference