Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification
Moria Mayala (LPSM (UMR\_8001)), Erwan Scornet (LPSM (UMR\_8001)), Charles Tillier (LMV), Olivier Wintenberger (LPSM (UMR\_8001))

TL;DR
This paper provides a theoretical analysis of rebalanced centered random forests for imbalanced classification, establishing CLTs, bias correction methods, and demonstrating variance reduction benefits in high imbalance scenarios.
Contribution
It introduces a CLT for infinite centered random forests, analyzes bias and variance in rebalanced datasets, and proposes a debiasing technique with proven variance reduction.
Findings
CLT with explicit rates for infinite CRF
Bias can be corrected with importance sampling
Variance reduction in high imbalance settings
Abstract
Many classification tasks involve imbalanced data, in which a class is largely underrepresented. Several techniques consists in creating a rebalanced dataset on which a classifier is trained. In this paper, we study theoretically such a procedure, when the classifier is a Centered Random Forests (CRF). We establish a Central Limit Theorem (CLT) on the infinite CRF with explicit rates and exact constant. We then prove that the CRF trained on the rebalanced dataset exhibits a bias, which can be removed with appropriate techniques. Based on an importance sampling (IS) approach, the resulting debiased estimator, called IS-ICRF, satisfies a CLT centered at the prediction function value. For high imbalance settings, we prove that the IS-ICRF estimator enjoys a variance reduction compared to the ICRF trained on the original data. Therefore, our theoretical analysis highlights the benefits of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques
MethodsConditional Random Field
