SeedAIchemy: LLM-Driven Seed Corpus Generation for Fuzzing
Aidan Wen, Norah A. Alzahrani, Jingzhi Jiang, Andrew Joe, Karen Shieh, Andy Zhang, Basel Alomair, David Wagner

TL;DR
SeedAIchemy leverages large language models to automate the creation of high-quality seed corpora for fuzzing, improving effectiveness over naive methods and matching manual curation.
Contribution
This paper presents SeedAIchemy, a novel LLM-driven tool that automates seed corpus generation for fuzzing, enhancing efficiency and quality compared to traditional approaches.
Findings
SeedAIchemy-generated corpora outperform naive corpora in fuzzing effectiveness.
Corpora produced by SeedAIchemy are comparable to manually curated ones.
The tool utilizes multiple modules with LLM workflows to optimize search term construction.
Abstract
We introduce SeedAIchemy, an automated LLM-driven corpus generation tool that makes it easier for developers to implement fuzzing effectively. SeedAIchemy consists of five modules which implement different approaches at collecting publicly available files from the internet. Four of the five modules use large language model (LLM) workflows to construct search terms designed to maximize corpus quality. Corpora generated by SeedAIchemy perform significantly better than a naive corpus and similarly to a manually-curated corpus on a diverse range of target programs and libraries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Testing and Debugging Techniques
