PerfGen: Automated Performance Benchmark Generation for Big Data   Analytics

Jiyuan Wang; Jason Teoh; Muhammand Ali Gulza; Qian Zhang; Miryung Kim

arXiv:2412.04687·cs.SE·December 9, 2024

PerfGen: Automated Performance Benchmark Generation for Big Data Analytics

Jiyuan Wang, Jason Teoh, Muhammand Ali Gulza, Qian Zhang, Miryung Kim

PDF

Open Access

TL;DR

PerfGen is an automated tool that generates inputs to trigger performance issues in big data analytics, using phased fuzzing, guidance metrics, and skew-inspired mutations to efficiently identify problematic workloads.

Contribution

It introduces a novel phased fuzzing approach with guidance metrics and input transformations to effectively generate performance-triggering inputs in big data analytics.

Findings

01

Achieves at least 11x speedup over traditional fuzzing.

02

Generates workload inputs in less than 0.004% of baseline iterations.

03

Successfully identifies performance symptoms in four case studies.

Abstract

Many symptoms of poor performance in big data analytics such as computational skews, data skews, and memory skews are input dependent. However, due to the lack of inputs that can trigger such performance symptoms, it is hard to debug and test big data analytics. We design PerfGen to automatically generate inputs for the purpose of performance testing. PerfGen overcomes three challenges when naively using automated fuzz testing for the purpose of performance testing. First, typical greybox fuzzing relies on coverage as a guidance signal and thus is unlikely to trigger interesting performance behavior. Therefore, PerfGen provides performance monitor templates that a user can extend to serve as a set of guidance metrics for grey-box fuzzing. Second, performance symptoms may occur at an intermediate or later stage of a big data analytics pipeline. Thus, PerfGen uses a phased fuzzing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management