Learn&Fuzz: Machine Learning for Input Fuzzing

Patrice Godefroid; Hila Peleg; Rishabh Singh

arXiv:1701.07232·cs.AI·January 26, 2017·2 cites

Learn&Fuzz: Machine Learning for Input Fuzzing

Patrice Godefroid, Hila Peleg, Rishabh Singh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a machine learning approach to automate input grammar generation for fuzzing, demonstrated on PDF inputs and a complex PDF parser, balancing learning input structure and breaking it for bug discovery.

Contribution

It presents a novel learn&fuzz algorithm that uses neural network-based models to generate input grammars and guide fuzzing, improving vulnerability detection.

Findings

01

Effective grammar generation for complex formats like PDF

02

Improved fuzzing guidance using learned input distributions

03

Successful case study on Microsoft's PDF parser

Abstract

Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft's new Edge browser. We discuss (and measure) the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

m-zakeri/iust_deep_fuzz
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Testing and Debugging Techniques