FuzzCoder: Byte-level Fuzzing Test via Large Language Model
Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli, Wang, Linzheng ChaI, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei, Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

TL;DR
FuzzCoder leverages large language models fine-tuned on successful fuzzing attacks to guide input mutation, significantly improving vulnerability discovery efficiency across multiple file formats.
Contribution
This work introduces FuzzCoder, a novel LLM-based framework that learns mutation strategies from successful fuzzing attempts to enhance program vulnerability detection.
Findings
FuzzCoder improves mutation effectiveness and crash discovery rates.
It outperforms traditional fuzzing methods on various file formats.
Significant gains in effective mutation proportion and crash counts.
Abstract
Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to pre-existing valid inputs. In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations. Specifically, we develop a framework to leverage the code LLMs to guide the mutation process of inputs in fuzzing. The mutation process is formulated as the sequence-to-sequence modeling, where LLM receives a sequence of bytes and then outputs the mutated byte sequence. FuzzCoder is fine-tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
