Prediction Gaps as Pathways to Explanation: Rethinking Educational Outcomes through Differences in Model Performance
Javier Garcia-Bernardo, Eva Jaspers, Weverthon Machado, Samuel Plach, Erik Jan van Leeuwen

TL;DR
This paper explores how differences in model performance, called prediction gaps, can reveal where social theories about educational outcomes succeed or need refinement, using data from the Netherlands.
Contribution
It introduces the concept of prediction gaps as a tool for sociological explanation and compares multiple models to identify where social context effects are underestimated.
Findings
Prediction gaps are generally small, indicating existing indicators capture most variation.
Larger gaps are found for girls without fathers, highlighting complex social effects.
Prediction methods can aid in understanding and explaining social phenomena.
Abstract
Social contexts -- such as families, schools, and neighborhoods -- shape life outcomes. The key question is not simply whether they matter, but rather for whom and under what conditions. Here, we argue that prediction gaps -- differences in predictive performance between statistical models of varying complexity -- offer a pathway for identifying surprising empirical patterns (i.e., not captured by simpler models) which highlight where theories succeed or fall short. Using population-scale administrative data from the Netherlands, we compare logistic regression, gradient boosting, and graph neural networks to predict university completion using early-life social contexts. Overall, prediction gaps are small, suggesting that previously identified indicators, particularly parental status, capture most measurable variation in educational attainment. However, gaps are larger for girls growing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntergenerational and Educational Inequality Studies · Computational and Text Analysis Methods · Family Dynamics and Relationships
