CodeT: Code Generation with Generated Tests

Bei Chen; Fengji Zhang; Anh Nguyen; Daoguang Zan; Zeqi Lin; Jian-Guang; Lou; Weizhu Chen

arXiv:2207.10397·cs.CL·November 24, 2022·64 cites

CodeT: Code Generation with Generated Tests

Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang, Lou, Weizhu Chen

PDF

Open Access 1 Repo 1 Video

TL;DR

CodeT automatically generates test cases using pre-trained models to evaluate and select the best code solutions, significantly improving code correctness and selection accuracy across multiple benchmarks.

Contribution

The paper introduces CodeT, a novel method that leverages pre-trained language models to generate test cases for code solutions, reducing manual effort and enhancing selection performance.

Findings

01

CodeT improves pass@1 metric on HumanEval to 65.8%.

02

CodeT achieves over 20% absolute improvement over previous state-of-the-art.

03

CodeT demonstrates consistent gains across multiple benchmarks and models.

Abstract

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select the most appropriate solution from the multiple samples generated by the pre-trained language models. A natural way to evaluate the quality and correctness of a code solution is to run it against a set of test cases, but the manual creation of such test cases is often costly and time-consuming. In this paper, we propose a novel method, CodeT, that leverages the same pre-trained language models to automatically generate test cases for the code samples, thus reducing the human effort and increasing the coverage of the test scenarios. CodeT then executes the code samples using the generated test cases, and performs a dual execution agreement, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/codet
pytorchOfficial

Videos

CodeT: Code Generation with Generated Tests· slideslive

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Software Engineering Research

MethodsTest