Estimating Nationwide High-Dosage Tutoring Expenditures: A Predictive Model Approach
Jason Godfrey, Trisha Banerjee

TL;DR
This paper develops a predictive model using XGBoost to estimate district-level high-dosage tutoring expenditures from incomplete data, aiding policy analysis during COVID-19 learning disruptions.
Contribution
It introduces a novel application of gradient-boosted decision trees to estimate unreported fiscal data in education policy contexts.
Findings
Estimated approximately $2.2 billion in tutoring expenditures.
Model achieved an out-of-sample R^2 of 0.358.
Demonstrated how machine learning can reconstruct fiscal patterns from sparse data.
Abstract
This study applies an optimized XGBoost regression model to estimate district-level expenditures on high-dosage tutoring from incomplete administrative data. The COVID-19 pandemic caused unprecedented learning loss, with K-12 students losing up to half a grade level in certain subjects. To address this, the federal government allocated $190 billion in relief. We know from previous research that small-group tutoring, summer and after school programs, and increased support staff were all common expenditures for districts. We don't know how much was spent in each category. Using a custom scraped dataset of over 7,000 ESSER (Elementary and Secondary School Emergency Relief) plans, we model tutoring allocations as a function of district characteristics such as enrollment, total ESSER funding, urbanicity, and school count. Extending the trained model to districts that mention tutoring but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
