FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation
Yifeng He, Jicheng Wang, Yuyang Rong, Hao Chen

TL;DR
FuzzAug is a novel data augmentation method that leverages coverage-guided fuzzing to enhance neural test generation, significantly increasing dataset diversity and improving performance in automated software testing.
Contribution
This paper introduces FuzzAug, a new technique combining fuzzing with neural models to generate more diverse and semantically meaningful test cases for software testing.
Findings
Doubling training dataset size improves test generation performance.
FuzzAug significantly outperforms baseline methods.
Incorporating fuzzing enhances test diversity and coverage.
Abstract
Testing is essential to modern software engineering for building reliable software. Given the high costs of manually creating test cases, automated test case generation, particularly methods utilizing large language models, has become increasingly popular. These neural approaches generate semantically meaningful tests that are more maintainable compared with traditional automatic testing methods like fuzzing. However, the diversity and volume of unit tests in current datasets are limited, especially for newer but important languages. In this paper, we present a novel data augmentation technique, FuzzAug, that introduces the benefits of fuzzing to large language models by introducing valid testing semantics and providing diverse coverage-guided inputs. Doubling the size of training datasets, FuzzAug improves the performance from the baselines significantly. This technique demonstrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · Machine Learning and Data Classification · Neural Networks and Applications
