Application of Seq2Seq Models on Code Correction
Shan Huang (1), Xiao Zhou (2), Sang Chin (2) ((1) Boston University, Department of Physics, (2) Boston University Department of Computer Science)

TL;DR
This paper explores the use of seq2seq models with a Pyramid Encoder for code correction in C/C++ and Java, achieving high repair rates and demonstrating effective transfer learning on small datasets.
Contribution
It introduces a Pyramid Encoder to improve efficiency in seq2seq models and demonstrates successful transfer learning for error classification on limited data.
Findings
75% repair rate for C/C++ code
56% repair rate for Java code
Effective transfer learning on small datasets
Abstract
We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets(SARD), and achieve 75\%(for C/C++) and 56\%(for Java) repair rates on these tasks. We introduce Pyramid Encoder in these seq2seq models, which largely increases the computational efficiency and memory efficiency, while remain similar repair rate to their non-pyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pre-trained on Juliet Test Suite, pointing out a novel way of processing small programing language datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
