Classification Trees for Imbalanced and Sparse Data: Surface-to-Volume Regularization
Yichen Zhu, Cheng Li, David B. Dunson

TL;DR
This paper introduces SVR-Tree, a novel classification tree method that penalizes the Surface-to-Volume Ratio to improve performance on imbalanced and sparse data, with theoretical guarantees and empirical validation.
Contribution
The paper proposes a new SVR-Tree algorithm that regularizes decision boundaries for better generalization on limited data, with proven consistency and convergence.
Findings
SVR-Tree outperforms existing methods on real imbalanced datasets.
The approach achieves estimation consistency and favorable convergence rates.
Computationally efficient implementation demonstrated in experiments.
Abstract
Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation consistency for SVR-Tree and rate of convergence for an idealized empirical risk minimizer of SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Anomaly Detection Techniques and Applications
MethodsInterpretability
