FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Liqun Yang; Jian Yang; Chaoren Wei; Guanglin Niu; Ge Zhang; Yunli; Wang; Linzheng ChaI; Wanxu Xia; Hongcheng Guo; Shun Zhang; Jiaheng Liu; Yuwei; Yin; Junran Peng; Jiaxin Ma; Liang Sun; Zhoujun Li

arXiv:2409.01944·cs.CL·September 4, 2024

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Liqun Yang, Jian Yang, Chaoren Wei, Guanglin Niu, Ge Zhang, Yunli, Wang, Linzheng ChaI, Wanxu Xia, Hongcheng Guo, Shun Zhang, Jiaheng Liu, Yuwei, Yin, Junran Peng, Jiaxin Ma, Liang Sun, Zhoujun Li

PDF

Open Access 1 Repo

TL;DR

FuzzCoder leverages large language models fine-tuned on successful fuzzing attacks to guide input mutation, significantly improving vulnerability discovery efficiency across multiple file formats.

Contribution

This work introduces FuzzCoder, a novel LLM-based framework that learns mutation strategies from successful fuzzing attempts to enhance program vulnerability detection.

Findings

01

FuzzCoder improves mutation effectiveness and crash discovery rates.

02

It outperforms traditional fuzzing methods on various file formats.

03

Significant gains in effective mutation proportion and crash counts.

Abstract

Fuzzing is an important dynamic program analysis technique designed for finding vulnerabilities in complex software. Fuzzing involves presenting a target program with crafted malicious input to cause crashes, buffer overflows, memory errors, and exceptions. Crafting malicious inputs in an efficient manner is a difficult open problem and the best approaches often apply uniform random mutations to pre-existing valid inputs. In this work, we propose to adopt fine-tuned large language models (FuzzCoder) to learn patterns in the input files from successful attacks to guide future fuzzing explorations. Specifically, we develop a framework to leverage the code LLMs to guide the mutation process of inputs in fuzzing. The mutation process is formulated as the sequence-to-sequence modeling, where LLM receives a sequence of bytes and then outputs the mutated byte sequence. FuzzCoder is fine-tuned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weimo3221/FUZZ-CODER
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling