Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-trained Language Models
Quanjun Zhang, Chunrong Fang, Yi Zheng, Yaxin Zhang, Yuan Zhao, Rubing, Huang, Jianyi Zhou, Yun Yang, Tao Zheng, Zhenyu Chen

TL;DR
RetriGen is a retrieval-augmented deep assertion generation method that combines lexical and semantic retrieval with a pre-trained language model to improve automatic unit test assertion generation accuracy.
Contribution
This paper introduces RetriGen, a novel hybrid retrieval and PLM-based approach that enhances assertion generation by integrating lexical and semantic retrieval with deep learning.
Findings
RetriGen outperforms six state-of-the-art methods in accuracy and CodeBLEU.
It achieves 57.66% assertion accuracy, a 50.66% improvement over baselines.
RetriGen demonstrates significant effectiveness across large-scale datasets.
Abstract
Unit testing validates the correctness of the units of the software system under test and serves as the cornerstone in improving software quality and reliability. To reduce manual efforts in writing unit tests, some techniques have been proposed to automatically generate test assertions, with recent integration-based approaches considered state-of-the-art. Despite being promising, such integration-based approaches face several limitations, including reliance on lexical matching for assertion retrieval and a limited training corpus for assertion generation. This paper proposes a novel retrieval-augmented deep assertion generation approach, namely RetriGen, based on a hybrid retriever and a pre-trained language model (PLM)-based generator. Given a focal-test, RetriGen first builds a hybrid assertion retriever to search for the most relevant Test-Assert Pair from external codebases. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
