Low-Cost and Comprehensive Non-textual Input Fuzzing with   LLM-Synthesized Input Generators

Kunpeng Zhang; Zongjie Li; Daoyuan Wu; Shuai Wang; Xin Xia

arXiv:2501.19282·cs.SE·February 3, 2025

Low-Cost and Comprehensive Non-textual Input Fuzzing with LLM-Synthesized Input Generators

Kunpeng Zhang, Zongjie Li, Daoyuan Wu, Shuai Wang, Xin Xia

PDF

Open Access

TL;DR

This paper introduces G2FUZZ, a hybrid fuzzing approach that leverages large language models to synthesize and mutate input generators for non-textual data, significantly improving bug detection and code coverage.

Contribution

The paper presents a novel hybrid fuzzing method combining LLM-driven generator synthesis with traditional mutation-based fuzzing for non-textual inputs.

Findings

01

G2FUZZ outperforms state-of-the-art tools in code coverage.

02

G2FUZZ finds more bugs across diverse formats.

03

LLM-assisted generator mutation enhances fuzzing effectiveness.

Abstract

Modern software often accepts inputs with highly complex grammars. Recent advances in large language models (LLMs) have shown that they can be used to synthesize high-quality natural language text and code that conforms to the grammar of a given input format. Nevertheless, LLMs are often incapable or too costly to generate non-textual outputs, such as images, videos, and PDF files. This limitation hinders the application of LLMs in grammar-aware fuzzing. We present a novel approach to enabling grammar-aware fuzzing over non-textual inputs. We employ LLMs to synthesize and also mutate input generators, in the form of Python scripts, that generate data conforming to the grammar of a given input format. Then, non-textual data yielded by the input generators are further mutated by traditional fuzzers (AFL++) to explore the software input space effectively. Our approach, namely G2FUZZ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques