Application of Seq2Seq Models on Code Correction

Shan Huang (1); Xiao Zhou (2); Sang Chin (2) ((1) Boston University; Department of Physics; (2) Boston University Department of Computer Science)

arXiv:2001.11367·cs.SE·August 6, 2020·1 cites

Application of Seq2Seq Models on Code Correction

Shan Huang (1), Xiao Zhou (2), Sang Chin (2) ((1) Boston University, Department of Physics, (2) Boston University Department of Computer Science)

PDF

Open Access

TL;DR

This paper explores the use of seq2seq models with a Pyramid Encoder for code correction in C/C++ and Java, achieving high repair rates and demonstrating effective transfer learning on small datasets.

Contribution

It introduces a Pyramid Encoder to improve efficiency in seq2seq models and demonstrates successful transfer learning for error classification on limited data.

Findings

01

75% repair rate for C/C++ code

02

56% repair rate for Java code

03

Effective transfer learning on small datasets

Abstract

We apply various seq2seq models on programming language correction tasks on Juliet Test Suite for C/C++ and Java of Software Assurance Reference Datasets(SARD), and achieve 75\%(for C/C++) and 56\%(for Java) repair rates on these tasks. We introduce Pyramid Encoder in these seq2seq models, which largely increases the computational efficiency and memory efficiency, while remain similar repair rate to their non-pyramid counterparts. We successfully carry out error type classification task on ITC benchmark examples (with only 685 code instances) using transfer learning with models pre-trained on Juliet Test Suite, pointing out a novel way of processing small programing language datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research