Benchmarking Deep Learning Fuzzers
Nima Shiri Harzevili, Hung Viet Pham, Song Wang

TL;DR
This paper conducts the first comprehensive empirical evaluation of state-of-the-art deep learning fuzzers using a new benchmark dataset of 627 real-world bugs from TensorFlow and PyTorch, revealing their limitations and proposing improvements.
Contribution
It introduces an extensive DL bug benchmark dataset, evaluates existing fuzzers' effectiveness, analyzes factors affecting bug detection, and proposes a simple corner case generator to enhance fuzzing performance.
Findings
Most bugs remain undetected by current fuzzers.
A simple corner case generator improves bug detection by 5-6 bugs per fuzzer.
Identifies key factors affecting fuzzers' bug detection capabilities.
Abstract
In this work, we set out to conduct the first ground-truth empirical evaluation of state-of-the-art DL fuzzers. Specifically, we first manually created an extensive DL bug benchmark dataset, which includes 627 real-world DL bugs from TensorFlow and PyTorch libraries reported by users between 2020 and 2022. Then we run three state-of-the-art DL fuzzers, i.e., FreeFuzz, DeepRel, and DocTer, on the benchmark by following their instructions. We find that these fuzzers are unable to detect many real bugs collected in our benchmark dataset. Specifically, most (235) of the 257 applicable bugs cannot be detected by any fuzzer. Our systematic analysis further identifies four major, broad, and common factors that affect these fuzzers' ability to detect real bugs. These findings present opportunities to improve the performance of the fuzzers in future work. As a proof of concept, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Adversarial Robustness in Machine Learning
