Learn&Fuzz: Machine Learning for Input Fuzzing
Patrice Godefroid, Hila Peleg, Rishabh Singh

TL;DR
This paper introduces a machine learning approach to automate input grammar generation for fuzzing, demonstrated on PDF inputs and a complex PDF parser, balancing learning input structure and breaking it for bug discovery.
Contribution
It presents a novel learn&fuzz algorithm that uses neural network-based models to generate input grammars and guide fuzzing, improving vulnerability detection.
Findings
Effective grammar generation for complex formats like PDF
Improved fuzzing guidance using learned input distributions
Successful case study on Microsoft's PDF parser
Abstract
Fuzzing consists of repeatedly testing an application with modified, or fuzzed, inputs with the goal of finding security vulnerabilities in input-parsing code. In this paper, we show how to automate the generation of an input grammar suitable for input fuzzing using sample inputs and neural-network-based statistical machine-learning techniques. We present a detailed case study with a complex input format, namely PDF, and a large complex security-critical parser for this format, namely, the PDF parser embedded in Microsoft's new Edge browser. We discuss (and measure) the tension between conflicting learning and fuzzing goals: learning wants to capture the structure of well-formed inputs, while fuzzing wants to break that structure in order to cover unexpected code paths and find bugs. We also present a new algorithm for this learn&fuzz challenge which uses a learnt input probability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Software Testing and Debugging Techniques
