Learning Lenient Parsing & Typing via Indirect Supervision

Toufique Ahmed; Premkumar Devanbu; Vincent Hellendoorn

arXiv:1910.05879·cs.SE·March 10, 2021

Learning Lenient Parsing & Typing via Indirect Supervision

Toufique Ahmed, Premkumar Devanbu, Vincent Hellendoorn

PDF

TL;DR

This paper introduces a novel, indirectly supervised transformer-based approach to train a lenient parser and typer for imperfect code fragments, improving parsing and error detection in real-world coding scenarios.

Contribution

The paper presents a new method that leverages large-scale, mostly correct code data and artificial corruption to train a lenient parser without human-curated datasets.

Findings

01

Achieves reasonable performance on STACKOVERFLOW fragments

02

Outperforms previous student error correction tools with 77% accuracy

03

Performs well on long, complex code snippets

Abstract

Both professional coders and teachers frequently deal with imperfect (fragmentary, incomplete, ill-formed) code. Such fragments are common in STACKOVERFLOW; students also frequently produce ill-formed code, for which instructors, TAs (or students themselves) must find repairs. In either case, the developer experience could be greatly improved if such code could somehow be parsed & typed; this makes such code more amenable to use within IDEs and allows early detection and repair of potential errors. We introduce a lenient parser, which can parse & type fragments, even ones with simple errors. Training a machine learner to leniently parse and type imperfect code requires a large training set including many pairs of imperfect code and its repair; such training sets are limited by human effort and curation. In this paper, we present a novel, indirectly supervised, approach to train a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.