Ensuring Reliability in Programming Knowledge Tracing: A Re-evaluation of Attention-augmented Models and Experimental Protocols
Jaewook Kim, Hyeoncheol Kim

TL;DR
This paper critically re-evaluates attention-based programming knowledge tracing models, highlighting how implementation choices and experimental protocols influence reported performance, and offers guidelines for more reliable evaluation practices.
Contribution
It identifies key issues affecting PKT model evaluation, proposes standardized protocols, and demonstrates that performance differences diminish under controlled, consistent settings.
Findings
Attention dimension settings significantly impact performance estimates.
Incorrect attempt ordering can violate temporal causality and inflate results.
Standardized evaluation protocols reduce performance gaps between models.
Abstract
Programming Knowledge Tracing (PKT) has recently advanced through hybrid approaches that integrate attention-based feature modeling for code representation with RNN-based sequential prediction. While these models report strong empirical performance, their reliability can be sensitive to subtle implementation and experimental design choices. This study revisits representative PKT models and shows that reported gains can be substantially influenced by model configuration and sequence construction practices. We identify issues in attention dimension settings that affect performance estimates, and demonstrate that improper ordering of student attempts, such as ignoring ServerTimestamp, can violate temporal causality and lead to overly optimistic results. To ensure consistent evaluation, hyperparameters are selected via grid search guided by a single designated fold and then fixed uniformly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
