SoK: Prudent Evaluation Practices for Fuzzing
Moritz Schloegel, Nils Bars, Nico Schiller, Lukas Bernhard, Tobias, Scharnowski, Addison Crump, Arash Ale Ebrahim, Nicolai Bissantz, Marius, Muench, Thorsten Holz

TL;DR
This paper reviews and analyzes how recent fuzzing research evaluates its methods, highlighting gaps between recommended practices and actual implementation, and emphasizing the importance of rigorous, reproducible evaluation protocols.
Contribution
It systematically examines 150 fuzzing papers to assess adherence to evaluation guidelines, revealing widespread neglect of statistical rigor and systematic error handling.
Findings
Many studies do not follow recommended statistical testing practices.
Reproducibility and systematic error considerations are often overlooked.
Evaluation practices vary widely across recent fuzzing research.
Abstract
Fuzzing has proven to be a highly effective approach to uncover software bugs over the past decade. After AFL popularized the groundbreaking concept of lightweight coverage feedback, the field of fuzzing has seen a vast amount of scientific work proposing new techniques, improving methodological aspects of existing strategies, or porting existing methods to new domains. All such work must demonstrate its merit by showing its applicability to a problem, measuring its performance, and often showing its superiority over existing works in a thorough, empirical evaluation. Yet, fuzzing is highly sensitive to its target, environment, and circumstances, e.g., randomness in the testing process. After all, relying on randomness is one of the core principles of fuzzing, governing many aspects of a fuzzer's behavior. Combined with the often highly difficult to control environment, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
