Input-Gen: Guided Generation of Stateful Inputs for Testing, Tuning, and Training
Ivan R. Ivanov, Joachim Meyer, Aiden Grossman, William S. Moses,, Johannes Doerfert

TL;DR
Input-Gen automatically creates stateful, executable inputs for software testing, tuning, and training, significantly enhancing code coverage and enabling better machine learning models that understand real-world program behavior.
Contribution
We introduce Input-Gen, a compiler-based tool that automatically generates stateful inputs for arbitrary programs, improving testing and training data quality.
Findings
Generated inputs achieved 90% validity on the dataset
Single input yielded 37% code coverage on average
Guided generation of five inputs increased coverage to 45%
Abstract
The size and complexity of software applications is increasing at an accelerating pace. Source code repositories (along with their dependencies) require vast amounts of labor to keep them tested, maintained, and up to date. As the discipline now begins to also incorporate automatically generated programs, automation in testing and tuning is required to keep up with the pace - let alone reduce the present level of complexity. While machine learning has been used to understand and generate code in various contexts, machine learning models themselves are trained almost exclusively on static code without inputs, traces, or other execution time information. This lack of training data limits the ability of these models to understand real-world problems in software. In this work we show that inputs, like code, can be generated automatically at scale. Our generated inputs are stateful, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Software System Performance and Reliability
