Reinforcement Learning from Automatic Feedback for High-Quality Unit   Test Generation

Benjamin Steenhoek; Michele Tufano; Neel Sundaresan; and Alexey; Svyatkovskiy

arXiv:2412.14308·cs.SE·January 7, 2025·3 cites

Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, and Alexey, Svyatkovskiy

PDF

Open Access

TL;DR

This paper introduces RLSQM, a reinforcement learning approach that uses static quality metrics to improve the quality of automated unit tests generated by LLMs, significantly reducing test smells and increasing correctness.

Contribution

The paper presents a novel reinforcement learning framework that optimizes LLM-generated tests for quality metrics, outperforming baseline models and even GPT-4 in test quality.

Findings

01

LLMs generate undesirable test smells up to 37% of the time.

02

RLSQM improves test quality metrics by up to 23%.

03

Nearly 100% syntactically-correct code achieved with RLSQM.

Abstract

Software testing is a crucial but time-consuming aspect of software development, and recently, Large Language Models (LLMs) have gained popularity for automated test case generation. However, because LLMs are trained on vast amounts of open-source code, they often generate test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we propose Reinforcement Learning from Static Quality Metrics (RLSQM), wherein we utilize Reinforcement Learning to generate high-quality unit tests based on static analysis-based quality metrics. First, we analyzed LLM-generated tests and show that LLMs frequently do generate undesirable test smells -- up to 37% of the time. Then, we implemented lightweight static analysis-based reward model and trained LLMs using this reward model to optimize for five code quality metrics. Our experimental results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReal-time simulation and control systems · Software Reliability and Analysis Research · Iterative Learning Control Systems

MethodsLinear Layer · Dropout · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Adam · Layer Normalization · Softmax · Attention Is All You Need