Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study
Andr\'e Storhaug, Jingyue Li

TL;DR
This study evaluates parameter-efficient fine-tuning methods for large language models in unit test generation, comparing their effectiveness and cost-efficiency against full fine-tuning across multiple models and benchmarks.
Contribution
It provides the first comprehensive empirical comparison of PEFT techniques like LoRA and prompt tuning for unit test generation with large language models.
Findings
LoRA performs comparably to full fine-tuning in several cases.
Prompt tuning is the most cost-effective method for large models.
Tuned models generate more diverse tests but fewer executable ones.
Abstract
Parameter-efficient fine-tuning (PEFT) methods, which fine-tune only a subset of model parameters, offer a promising solution by reducing the computational costs of tuning large language models (LLMs) while maintaining their performance. Existing studies have explored using PEFT and LLMs for various code-related tasks and found that the effectiveness of PEFT techniques is task-dependent. The state-of-the-art is limited to using LLMs with full fine-tuning to generate unit tests. The application of PEFT techniques in unit test generation remains underexplored. This paper investigates both full fine-tuning and various PEFT methods, including LoRA, (IA)^3, and prompt tuning, across thirteen models of different architectures and sizes. We use well-established benchmark datasets to evaluate their effectiveness in unit test generation and measure syntax correctness, CodeBLEU, pass@1,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
