A Leakage-Aware Data Layer For Student Analytics: The Capire Framework For Multilevel Trajectory Modeling
H. R. Paz

TL;DR
This paper introduces a leakage-aware data layer and framework for multilevel student trajectory modeling, improving robustness and interpretability of dropout prediction models by preventing data leakage and enabling archetype discovery.
Contribution
It proposes a novel data organization and the formalization of Value of Observation Time (VOT) to prevent data leakage in student analytics models.
Findings
Identified 13 stable student dropout archetypes
VOT-restricted features improve model robustness
Framework supports causal inference and agent-based modeling
Abstract
Predictive models for student dropout, while often accurate, frequently rely on opportunistic feature sets and suffer from undocumented data leakage, limiting their explanatory power and institutional usefulness. This paper introduces a leakage-aware data layer for student trajectory analytics, which serves as the methodological foundation for the CAPIRE framework for multilevel modelling. We propose a feature engineering design that organizes predictors into four levels: N1 (personal and socio-economic attributes), N2 (entry moment and academic history), N3 (curricular friction and performance), and N4 (institutional and macro-context variables)As a core component, we formalize the Value of Observation Time (VOT) as a critical design parameter that rigorously separates observation windows from outcome horizons, preventing data leakage by construction. An illustrative application in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Intelligent Tutoring Systems and Adaptive Learning · Data Visualization and Analytics
