Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go
Yashshi Pipalani, Hritik Raj, Rajat Ghosh, Vaishnavi Bhargava, Debojyoti Dutta

TL;DR
This paper introduces Go-UT-Bench, a dataset of Golang code and unit tests, to improve LLM performance on unit test generation, addressing data imbalance issues in low-resource languages.
Contribution
It provides a new benchmark dataset for fine-tuning LLMs on unit test generation in Go, demonstrating improved performance over base models.
Findings
Finetuned models outperform base models on over 75% of tasks.
The dataset covers 5264 code-test pairs from 10 repositories.
Effective for enhancing LLMs in software engineering tasks.
Abstract
Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, especially in low resource languages like Golang. As a result, models excel at code autocompletion but struggle with real world developer workflows such as unit test generation. To address this gap, we introduce GO UT Bench, a benchmark dataset of 5264 pairs of code and unit tests, drawn from 10 permissively licensed Golang repositories spanning diverse domain. We evaluate its effectiveness as a fine tuning dataset across two LLM families i.e. mixture of experts and dense decoders. Our results show that finetuned models outperform their base counterparts on more than 75% of benchmark tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Logic, programming, and type systems
