Benchmarking Deep Learning Fuzzers

Nima Shiri Harzevili; Hung Viet Pham; Song Wang

arXiv:2310.06912·cs.SE·October 12, 2023

Benchmarking Deep Learning Fuzzers

Nima Shiri Harzevili, Hung Viet Pham, Song Wang

PDF

Open Access

TL;DR

This paper conducts the first comprehensive empirical evaluation of state-of-the-art deep learning fuzzers using a new benchmark dataset of 627 real-world bugs from TensorFlow and PyTorch, revealing their limitations and proposing improvements.

Contribution

It introduces an extensive DL bug benchmark dataset, evaluates existing fuzzers' effectiveness, analyzes factors affecting bug detection, and proposes a simple corner case generator to enhance fuzzing performance.

Findings

01

Most bugs remain undetected by current fuzzers.

02

A simple corner case generator improves bug detection by 5-6 bugs per fuzzer.

03

Identifies key factors affecting fuzzers' bug detection capabilities.

Abstract

In this work, we set out to conduct the first ground-truth empirical evaluation of state-of-the-art DL fuzzers. Specifically, we first manually created an extensive DL bug benchmark dataset, which includes 627 real-world DL bugs from TensorFlow and PyTorch libraries reported by users between 2020 and 2022. Then we run three state-of-the-art DL fuzzers, i.e., FreeFuzz, DeepRel, and DocTer, on the benchmark by following their instructions. We find that these fuzzers are unable to detect many real bugs collected in our benchmark dataset. Specifically, most (235) of the 257 applicable bugs cannot be detected by any fuzzer. Our systematic analysis further identifies four major, broad, and common factors that affect these fuzzers' ability to detect real bugs. These findings present opportunities to improve the performance of the fuzzers in future work. As a proof of concept, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Adversarial Robustness in Machine Learning