Automated Feedback Generation for Competition-Level Code
Jialu Zhang, De Li, John C. Kolesar, Hanyuan Shi, Ruzica Piskac

TL;DR
This paper introduces Clef, a data-driven tool that automatically generates feedback for competition-level programming submissions by learning repair patterns from historical data, significantly aiding programmers in debugging their code.
Contribution
Clef is the first tool to automatically generate repairs for competition-level code by learning from historical submission data using a novel merge tree data structure.
Findings
Achieves 42.1% repair accuracy on real-world problems
Repairs 34.1% of submissions from programmers who never solved the problem
Introduces merge trees to encode large and small code changes
Abstract
Competitive programming has become a popular way for programmers to test their skills. Large-scale online programming contests attract millions of experienced programmers to compete against each other. Competition-level programming problems are challenging in nature, and participants often fail to solve the problem on their first attempt. Some online platforms for competitive programming allow programmers to practice on competition-level problems as well, and the standard feedback for an incorrect practice submission is the first test case that the submission fails. Often, the failed test case does not provide programmers with enough information to resolve the errors in their code, and they abandon the problem after several more unsuccessful attempts. We present Clef, the first data-driven tool that can generate feedback on competition-level code automatically by repairing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability
