New Hard-thresholding Rules based on Data Splitting in High-dimensional Imbalanced Classification
Arezou Mojiri, Abbas Khalili, Ali Zeinal Hamadani

TL;DR
This paper addresses the challenge of imbalanced high-dimensional classification by proposing a new data splitting based hard-thresholding rule that improves classification accuracy and feature selection efficiency.
Contribution
It introduces a novel data splitting technique for hard-thresholding in imbalanced high-dimensional classification, demonstrating asymptotic optimality and improved finite-sample performance.
Findings
The proposed method reduces misclassification rates in imbalanced high-dimensional data.
It outperforms existing methods or matches their performance with fewer features.
The method is computationally efficient and effective in real data applications.
Abstract
In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this paper, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of hard-thresholding rules based on a data splitting technique that reduces the large difference between the misclassification rates. We show that the proposed method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Advanced Statistical Process Monitoring
MethodsLinear Discriminant Analysis
