Impact of Code Context and Prompting Strategies on Automated Unit Test Generation with Modern General-Purpose Large Language Models
Jakub Walczak, Piotr Tomalak, Artur Laskowski

TL;DR
This study evaluates how code context and prompting strategies influence the effectiveness of automated unit test generation using large language models, showing significant improvements with specific techniques like docstrings and chain-of-thought prompting.
Contribution
It systematically analyzes the effects of code context and prompting strategies on LLM-generated unit tests, highlighting the effectiveness of chain-of-thought prompting and detailed context inclusion.
Findings
Including docstrings improves code adequacy.
Full implementation context yields smaller gains.
Chain-of-thought prompting achieves up to 96.3% branch coverage.
Abstract
Generative AI is gaining increasing attention in software engineering, where testing remains an indispensable reliability mechanism. According to the widely adopted testing pyramid, unit tests constitute the majority of test cases and are often schematic, requiring minimal domain expertise. Automatically generating such tests under the supervision of software engineers can significantly enhance productivity during the development phase of the software lifecycle. This paper investigates the impact of code context and prompting strategies on the quality and adequacy of unit tests generated by various large language models (LLMs) across several families. The results show that including docstrings notably improves code adequacy, while further extending context to the full implementation yields definitely smaller gains. Notably, the chain-of-thought prompting strategy -- applied even to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
