Predicting Louisiana Public High School Dropout through Imbalanced Learning Techniques
Marmar Orooji, Jianhua Chen

TL;DR
This paper explores the use of imbalanced learning techniques to improve the prediction of high school dropout risk in Louisiana, demonstrating enhanced recall at the expense of precision across various classifiers.
Contribution
It applies and compares multiple imbalanced learning methods with machine learning algorithms to address dropout prediction in a large administrative dataset.
Findings
Imbalanced learning improves recall significantly.
Precision decreases when using imbalanced techniques.
Base classifiers have higher precision but lower recall.
Abstract
This study is motivated by the magnitude of the problem of Louisiana high school dropout and its negative impacts on individual and public well-being. Our goal is to predict students who are at risk of high school dropout, by examining Louisiana administrative dataset. Due to the imbalanced nature of the dataset, imbalanced learning techniques including resampling, case weighting, and cost-sensitive learning have been applied to enhance the prediction performance on the rare class. Performance metrics used in this study are F-measure, recall and precision of the rare class. We compare the performance of several machine learning algorithms such as neural networks, decision trees and bagging trees in combination with the imbalanced learning approaches using an administrative dataset of size of 366k+ from Louisiana Department of Education. Experiments show that application of imbalanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Artificial Intelligence in Healthcare
MethodsDropout
