Student Log-Data from a Randomized Evaluation of Educational Technology: A Causal Case Study
Adam C Sales, John F Pane

TL;DR
This paper explores methodological approaches to analyze detailed log data from educational technology evaluations, addressing challenges like non-random implementation and complex data structures to uncover causal effects and heterogeneity.
Contribution
It introduces three analytical methods—observational study, causal mediation, and principal stratification—to better interpret log data in educational evaluations.
Findings
Hints may reduce posttest scores according to some analyses.
Higher hint requests could be associated with better outcomes in certain groups.
Methodological approaches can yield conflicting insights, highlighting the complexity of causal inference.
Abstract
Randomized evaluations of educational technology produce log data as a bi-product: highly granular data student and teacher usage. These datasets could shed light on causal mechanisms, effect heterogeneity, or optimal use. However, there are methodological challenges: implementation is not randomized and is only defined for the treatment group, and log datasets have a complex structure. This paper discusses three approaches to help surmount these issues. One approach uses data from the treatment group to estimate the effect of usage on outcomes in an observational study. Another, causal mediation analysis, estimates the role of usage in driving the overall effect. Finally, principal stratification estimates overall effects for groups of students with the same "potential" usage. We analyze hint data from an evaluation of the Cognitive Tutor Algebra I curriculum using these three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
