FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation

Yifeng He; Jicheng Wang; Yuyang Rong; Hao Chen

arXiv:2406.08665·cs.SE·June 30, 2025

FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation

Yifeng He, Jicheng Wang, Yuyang Rong, Hao Chen

PDF

Open Access 1 Video

TL;DR

FuzzAug is a novel data augmentation method that leverages coverage-guided fuzzing to enhance neural test generation, significantly increasing dataset diversity and improving performance in automated software testing.

Contribution

This paper introduces FuzzAug, a new technique combining fuzzing with neural models to generate more diverse and semantically meaningful test cases for software testing.

Findings

01

Doubling training dataset size improves test generation performance.

02

FuzzAug significantly outperforms baseline methods.

03

Incorporating fuzzing enhances test diversity and coverage.

Abstract

Testing is essential to modern software engineering for building reliable software. Given the high costs of manually creating test cases, automated test case generation, particularly methods utilizing large language models, has become increasingly popular. These neural approaches generate semantically meaningful tests that are more maintainable compared with traditional automatic testing methods like fuzzing. However, the diversity and volume of unit tests in current datasets are limited, especially for newer but important languages. In this paper, we present a novel data augmentation technique, FuzzAug, that introduces the benefits of fuzzing to large language models by introducing valid testing semantics and providing diverse coverage-guided inputs. Doubling the size of training datasets, FuzzAug improves the performance from the baselines significantly. This technique demonstrates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation· underline

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Machine Learning and Data Classification · Neural Networks and Applications