Infinite random forests for imbalanced classification tasks
Moria Mayala, Olivier Wintenberger, Charles Tillier, Cl\'ement Dombry

TL;DR
This paper develops and analyzes infinite random forests tailored for imbalanced classification, introducing debiasing techniques and demonstrating their theoretical and empirical advantages over traditional methods.
Contribution
It proposes a debiasing procedure for IRFs using Importance Sampling, establishing asymptotic normality and near-minimax optimality for Lipschitz objectives.
Findings
Debiasing improves minority class prediction in imbalanced data.
IS bagged 1-NN estimator has lower asymptotic variance.
Simulation studies confirm empirical benefits of the proposed methods.
Abstract
We study predictive probability inference in classification tasks using random forests under class imbalance. We focus on two simplified variants of Breiman's algorithm, namely subsampling Infinite Random Forests (IRFs) and under-sampling IRFs, and establish their asymptotic normality. In the under-sampling setting, training data from both classes are resampled to achieve balance, which enhances minority class representation but introduces a biased model. To correct this, we propose a debiasing procedure based on Importance Sampling (IS) using odds ratios. We instantiate our results using 1-Nearest Neighbor (1-NN) classifiers as base learners in the IRFs and prove the nearly minimax optimality of the approach for Lipschitz continuous objectives. We also show that the IS bagged 1-NN estimator matches the convergence rate of its subsampled counterpart while attaining lower asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Algorithms · Anomaly Detection Techniques and Applications
