Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing

Rodrigo Tertulino; Ricardo Almeida

arXiv:2508.18316·cs.LG·December 15, 2025

Evaluating Federated Learning for At-Risk Student Prediction: A Comparative Analysis of Model Complexity and Data Balancing

Rodrigo Tertulino, Ricardo Almeida

PDF

TL;DR

This paper evaluates federated learning for predicting at-risk students, comparing model complexity and data balancing, demonstrating its effectiveness and scalability in privacy-preserving early-warning systems.

Contribution

It introduces a federated learning framework for student risk prediction and analyzes the trade-offs between model complexity and data balancing.

Findings

01

Federated learning achieves ROC AUC of ~85% in student risk prediction.

02

Model complexity impacts predictive performance and computational requirements.

03

Data balancing improves model fairness and accuracy.

Abstract

This study proposes and validates a Federated Learning (FL) framework to proactively identify at-risk students while preserving data privacy. Persistently high dropout rates in distance education remain a pressing institutional challenge. Using the large-scale OULAD dataset, we simulate a privacy-centric scenario where models are trained on early academic performance and digital engagement patterns. Our work investigates the practical trade-offs between model complexity (Logistic Regression vs. a Deep Neural Network) and the impact of local data balancing. The resulting federated model achieves strong predictive power (ROC AUC approximately 85%), demonstrating that FL is a practical and scalable solution for early-warning systems that inherently respects student data sovereignty.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.