Using Machine Learning to Identify the Most At-Risk Students in Physics Classes
Jie Yang, Seth DeVore, Dona Hewagallage, Paul Miller, Qing X. Ryan,, and John Stewart

TL;DR
This study improves machine learning models for early identification of at-risk physics students by addressing class imbalance and integrating institutional data, achieving better prediction accuracy within the first weeks of class.
Contribution
It introduces techniques for handling unbalanced outcome variables in student performance prediction models and demonstrates their effectiveness across multiple datasets.
Findings
Model tuning increased DFW prediction accuracy from 16% to 43%.
Combining institutional and in-class data improved accuracy to 53%.
Demographic variables were not significant predictors.
Abstract
Machine learning algorithms have recently been used to predict students' performance in an introductory physics class. The prediction model classified students as those likely to receive an A or B or students likely to receive a grade of C, D, F or withdraw from the class. Early prediction could better allow the direction of educational interventions and the allocation of educational resources. However, the performance metrics used in that study become unreliable when used to classify whether a student would receive an A, B or C (the ABC outcome) or if they would receive a D, F or withdraw (W) from the class (the DFW outcome) because the outcome is substantially unbalanced with between 10\% to 20\% of the students receiving a D, F, or W. This work presents techniques to adjust the prediction models and alternate model performance metrics more appropriate for unbalanced outcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
