Validating LLM-Generated Programs with Metamorphic Prompt Testing

Xiaoyin Wang; Dakai Zhu

arXiv:2406.06864·cs.SE·June 12, 2024

Validating LLM-Generated Programs with Metamorphic Prompt Testing

Xiaoyin Wang, Dakai Zhu

PDF

Open Access

TL;DR

This paper introduces metamorphic prompt testing, a novel method to validate the correctness of LLM-generated code by checking semantic consistency across paraphrased prompts, effectively detecting errors in generated programs.

Contribution

It proposes a new validation technique for LLM-generated code based on semantic consistency checks, addressing quality and correctness concerns in AI-assisted programming.

Findings

01

Detects 75% of errors in GPT-4 generated code

02

False positive rate of 8.6% in error detection

03

Effective validation method for LLM-generated programs

Abstract

The latest paradigm shift in software development brings in the innovation and automation afforded by Large Language Models (LLMs), showcased by Generative Pre-trained Transformer (GPT), which has shown remarkable capacity to generate code autonomously, significantly reducing the manual effort required for various programming tasks. Although, the potential benefits of LLM-generated code are vast, most notably in efficiency and rapid prototyping, as LLMs become increasingly integrated into the software development lifecycle and hence the supply chain, complex and multifaceted challenges arise as the code generated from these language models carry profound questions on quality and correctness. Research is required to comprehensively explore these critical concerns surrounding LLM-generated code. In this paper, we propose a novel solution called metamorphic prompt testing to address…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer