# A three-stage machine learning and inference approach for educational data

**Authors:** Ting Da

PMC · DOI: 10.1038/s41598-025-89394-2 · Scientific Reports · 2025-04-04

## TL;DR

This paper introduces a three-stage machine learning approach to identify factors influencing student academic performance using educational data.

## Contribution

The novel three-stage framework combines machine learning and causal inference to uncover latent relationships in educational datasets.

## Key findings

- Machine learning methods effectively identify candidate variables associated with student grades.
- The post-double-selection process improves control variable selection for causal analysis.
- Case studies demonstrate the framework's effectiveness in educational data analysis.

## Abstract

A central task in educational studies is to uncover factors that drive a student’s academic performance. While existing studies have utilized meticulous regression designs, it is challenging to select appropriate controls. Machine learning, however, offers a solution whereby the entire variable set can be inspected and filtered by different optimization schemes. In that light, this paper adopts a three-stage framework to analyze and discover potentially latent causal relationships from an open dataset from UCI. In the first stage, machine learning methods are employed to select candidate variables that are closely associated with student grades, and then a “post-double-selection” process is implemented to select the set of control variables. In the final stage, three case studies are conducted to illustrate the effectiveness of the three-stage design. The model pipeline is suitable for situations where there is only minimal prior knowledge available to address a potentially causal research question.

## Full-text entities

- **Chemicals:** alcohol (MESH:D000438), fedu4 (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11968921/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11968921/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC11968921/full.md

---
Source: https://tomesphere.com/paper/PMC11968921