Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing
Rodrigo Tertulino, Ricardo Almeida

TL;DR
This paper evaluates federated learning for predicting at-risk students, comparing model complexity and data balancing, demonstrating its effectiveness and scalability in privacy-preserving early-warning systems.
Contribution
It introduces a federated learning framework for student risk prediction and analyzes the trade-offs between model complexity and data balancing.
Findings
Federated learning achieves ROC AUC of ~85% in student risk prediction.
Model complexity impacts predictive performance and computational requirements.
Data balancing improves model fairness and accuracy.
Abstract
This study proposes and validates a Federated Learning (FL) framework to proactively identify at-risk students while preserving data privacy. Persistently high dropout rates in distance education remain a pressing institutional challenge. Using the large-scale OULAD dataset, we simulate a privacy-centric scenario where models are trained on early academic performance and digital engagement patterns. Our work investigates the practical trade-offs between model complexity (Logistic Regression vs. a Deep Neural Network) and the impact of local data balancing. The resulting federated model achieves strong predictive power (ROC AUC approximately 85%), demonstrating that FL is a practical and scalable solution for early-warning systems that inherently respects student data sovereignty.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
