Efficient Fuzz Testing for Apache Spark Using Framework Abstraction

Qian Zhang; Jiyuan Wang; Muhammad Ali Gulzar; Rohan Padhye; Miryung; Kim

arXiv:2103.05118·cs.SE·March 10, 2021

Efficient Fuzz Testing for Apache Spark Using Framework Abstraction

Qian Zhang, Jiyuan Wang, Muhammad Ali Gulzar, Rohan Padhye, Miryung, Kim

PDF

Open Access

TL;DR

This paper introduces BigFuzz, a novel fuzz testing tool for Apache Spark that uses framework abstraction and schema-aware mutations to efficiently detect errors in data-intensive applications.

Contribution

The paper presents BigFuzz, a framework-aware fuzz testing approach that significantly improves testing efficiency and error detection in DISC systems like Apache Spark.

Findings

01

Speedup of 1477x over random fuzzing

02

271% increase in application code coverage

03

157% improvement in error detection

Abstract

The emerging data-intensive applications are increasingly dependent on data-intensive scalable computing (DISC) systems, such as Apache Spark, to process large data. Despite their popularity, DISC applications are hard to test. In recent years, fuzz testing has been remarkably successful; however, it is nontrivial to apply such traditional fuzzing to big data analytics directly because: (1) the long latency of DISC systems prohibits the applicability of fuzzing, and (2) conventional branch coverage is unlikely to identify application logic from the DISC framework implementation. We devise a novel fuzz testing tool called BigFuzz that automatically generates concrete data for an input Apache Spark program. The key essence of our approach is that we abstract the dataflow behavior of the DISC framework with executable specifications and we design schema-aware mutations based on common…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Advanced Malware Detection Techniques