SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning
Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui

TL;DR
This paper presents SentiDrop, a multi-modal machine learning model that combines sentiment analysis of student comments with socio-demographic and behavioral data to predict dropout in distance learning with high accuracy.
Contribution
It introduces a novel multi-modal approach integrating BERT-based sentiment analysis with XGBoost on educational data for dropout prediction.
Findings
Achieved 84% accuracy on unseen data, outperforming baseline models.
Demonstrated improved precision and F1-score with the combined model.
Validated the effectiveness of multi-source data integration for dropout prediction.
Abstract
School dropout is a serious problem in distance learning, where early detection is crucial for effective intervention and student perseverance. Predicting student dropout using available educational data is a widely researched topic in learning analytics. Our partner's distance learning platform highlights the importance of integrating diverse data sources, including socio-demographic data, behavioral data, and sentiment analysis, to accurately predict dropout risks. In this paper, we introduce a novel model that combines sentiment analysis of student comments using the Bidirectional Encoder Representations from Transformers (BERT) model with socio-demographic and behavioral data analyzed through Extreme Gradient Boosting (XGBoost). We fine-tuned BERT on student comments to capture nuanced sentiments, which were then merged with key features selected using feature importance techniques…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDropout · BERT
