Novice Type Error Diagnosis with Natural Language Models
Chuqin Geng, Haolin Ye, Yixuan Li, Tianyu Han, Brigitte Pientka, and, Xujie Si

TL;DR
This paper explores using natural language models to diagnose type errors in novice programming, demonstrating significant improvements over previous data-driven methods without relying on hand-engineered features.
Contribution
It introduces a novel end-to-end natural language model approach for type error localization that outperforms existing data-driven methods in accuracy.
Findings
Language model predicts type errors correctly 62% of the time.
Outperforms previous state-of-the-art by 11%.
Structural probes explain performance differences.
Abstract
Strong static type systems help programmers eliminate many errors without much burden of supplying type annotations. However, this flexibility makes it highly non-trivial to diagnose ill-typed programs, especially for novice programmers. Compared to classic constraint solving and optimization-based approaches, the data-driven approach has shown great promise in identifying the root causes of type errors with higher accuracy. Instead of relying on hand-engineered features, this work explores natural language models for type error localization, which can be trained in an end-to-end fashion without requiring any features. We demonstrate that, for novice type error diagnosis, the language model-based approach significantly outperforms the previous state-of-the-art data-driven approach. Specifically, our model could predict type errors correctly 62% of the time, outperforming the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
