Extending Machine Learning to Predict Unbalanced Physics Course Outcomes
Seth DeVore, Jie Yang, and John Stewart

TL;DR
This study improves machine learning classification of unbalanced physics course outcomes, especially for D and F grades, by optimizing the random forest algorithm and analyzing demographic impacts on prediction accuracy.
Contribution
It extends previous methods to better predict D and F outcomes in unbalanced datasets using random forests and demographic analysis.
Findings
Random forest with threshold adjustment increased D/F prediction accuracy to 46%.
Optimized models achieved 69% accuracy for C/D/F and 46% for D/F outcomes.
Prediction sensitivity varied across demographic groups, with higher accuracy for underrepresented minorities and first-generation students.
Abstract
Machine learning algorithms have recently been used to classify students as those likely to receive an A or B or students likely to receive a C, D, or F in a physics class. The performance metrics used in that study become unreliable when the outcome variable is substantially unbalanced. This study seeks to further explored the classification of students who will receive a C, D, and F and extend those methods to predicting whether a student will receive a D or F. The sample used for this work () is substantially unbalanced with only 12\% of the students receiving a D or F. Applying the same methods as the previous study produced a classifier that was very inaccurate, classifying only 20\% of the D or F cases correctly. This study will focus on the random forest machine learning algorithm. By adjusting the random forest decision threshold, the correct classification rate of the D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
