Machine Learning Classifiers Do Not Improve the Prediction of Academic Risk: Evidence from Australia
Sarah Cornell-Farrow, Robert Garrard

TL;DR
This study demonstrates that machine learning classifiers do not significantly outperform logistic regression in predicting academic risk among Australian primary and middle school students, even with a large dataset of 1.2 million records.
Contribution
The paper provides evidence that machine learning models do not improve prediction accuracy over logistic regression in educational risk assessment using large-scale data.
Findings
ML models do not outperform logistic regression in predicting below-standard academic performance.
Results are consistent even with a large dataset of 1.2 million students.
Traditional logistic regression remains competitive for this prediction task.
Abstract
Machine learning methods tend to outperform traditional statistical models at prediction. In the prediction of academic achievement, ML models have not shown substantial improvement over logistic regression. So far, these results have almost entirely focused on college achievement, due to the availability of administrative datasets, and have contained relatively small sample sizes by ML standards. In this article we apply popular machine learning models to a large dataset ( million) containing primary and middle school performance on a standardized test given annually to Australian students. We show that machine learning models do not outperform logistic regression for detecting students who will perform in the `below standard' band of achievement upon sitting their next test, even in a large- setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics
MethodsLogistic Regression
