Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented   Pre-trained Language Models

Quanjun Zhang; Chunrong Fang; Yi Zheng; Yaxin Zhang; Yuan Zhao; Rubing; Huang; Jianyi Zhou; Yun Yang; Tao Zheng; Zhenyu Chen

arXiv:2502.16071·cs.SE·February 25, 2025

Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-trained Language Models

Quanjun Zhang, Chunrong Fang, Yi Zheng, Yaxin Zhang, Yuan Zhao, Rubing, Huang, Jianyi Zhou, Yun Yang, Tao Zheng, Zhenyu Chen

PDF

Open Access 1 Repo

TL;DR

RetriGen is a retrieval-augmented deep assertion generation method that combines lexical and semantic retrieval with a pre-trained language model to improve automatic unit test assertion generation accuracy.

Contribution

This paper introduces RetriGen, a novel hybrid retrieval and PLM-based approach that enhances assertion generation by integrating lexical and semantic retrieval with deep learning.

Findings

01

RetriGen outperforms six state-of-the-art methods in accuracy and CodeBLEU.

02

It achieves 57.66% assertion accuracy, a 50.66% improvement over baselines.

03

RetriGen demonstrates significant effectiveness across large-scale datasets.

Abstract

Unit testing validates the correctness of the units of the software system under test and serves as the cornerstone in improving software quality and reliability. To reduce manual efforts in writing unit tests, some techniques have been proposed to automatically generate test assertions, with recent integration-based approaches considered state-of-the-art. Despite being promising, such integration-based approaches face several limitations, including reliance on lexical matching for assertion retrieval and a limited training corpus for assertion generation. This paper proposes a novel retrieval-augmented deep assertion generation approach, namely RetriGen, based on a hybrid retriever and a pre-trained language model (PLM)-based generator. Given a focal-test, RetriGen first builds a hybrid assertion retriever to search for the most relevant Test-Assert Pair from external codebases. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iSEngLab/RetriGen
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis