# Applying machine learning models to predict and identify factors affecting academic performance of paramedical students: a cross-sectional study

**Authors:** Omid Zarei, Maryam Talebi Moghaddam, Zahra Arefzadeh, Sadegh Moradi Vastegani, Fatemeh Zeraatpishe, Najimeh Beygi, Mohammad Ghorbani

PMC · DOI: 10.1186/s12909-026-08799-3 · BMC Medical Education · 2026-02-19

## TL;DR

This study uses machine learning to predict and understand factors affecting academic performance in paramedical students, offering insights to improve education strategies.

## Contribution

The study introduces machine learning models to analyze academic performance in paramedical education, identifying key predictors and model effectiveness.

## Key findings

- Random Forest excelled in predicting academic failure with high accuracy and AUC.
- Gradient Boosting performed best in predicting academic success with high F1-score.
- High school GPA was the most important predictor of academic outcomes.

## Abstract

Academic performance is a key indicator of student success and institutional effectiveness in higher education, especially in paramedical fields where precision and competence are essential. Yet traditional methods often miss these complex factors. This study utilizes machine learning to predict performance and identify key influences among paramedical students, providing data-driven insights to inform the improvement of educational strategies.

We conducted a cross-sectional study among 135 paramedical students at Fasa University of Medical Sciences, Iran, using convenience sampling. The dataset was constructed by combining face-to-face, paper-based self-administered questionnaire responses with students’ academic records obtained from the university’s Central Education Office. Validated questionnaires assessed demographics and failure/success factors. We applied Random Forest, Decision Tree, and Gradient Boosting models to predict outcomes and estimate feature importance. Model performance was evaluated using accuracy, precision, recall, F1-score, AUC, and G-mean with an 80:20 train–test split, and results were averaged over 10 iterations.

The Random Forest model excelled at predicting academic failure, achieving an accuracy of 90.74%, an F1-score of 76.19%, and an AUC of 96.19%, highlighting its precision in identifying at-risk students. Conversely, Gradient Boosting outperformed in predicting academic success, with an accuracy of 90.74%, an F1-score of 94.25%, and an AUC of 93.45%, demonstrating its ability to recognize improvement trends effectively. High school GPA emerged as the most important predictor of both outcomes, followed by academic traits and educational factors. Exploratory decision tree visualizations indicated possible hierarchical interactions, such as those potentially linking regional quotas and field of study to failure risk, and gender to success pathways; however, given the instability of decision trees on small datasets, these patterns are preliminary and require validation in larger cohorts. This study advances the application of machine learning in educational research, providing actionable insights for targeted interventions and policy refinement in paramedical education.

## Full-text entities

- **Genes:** GYPA (glycophorin A (MNS blood group)) [NCBI Gene 2993] {aka CD235a, GPA, GPErik, GPSAT, HGpMiV, HGpMiXI}
- **Diseases:** academic (MESH:D007859), cognitive problems (MESH:D003072), Academic Failure (MESH:D051437)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13020036/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13020036/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC13020036/full.md

---
Source: https://tomesphere.com/paper/PMC13020036