BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework
Yuqing Zhu, Jianfeng Zhan, Chuliang Weng, Raghunath Nambiar, Jinchao, Zhang, Xingzhen Chen, and Lei Wang

TL;DR
BigOP is a comprehensive benchmarking framework for big data systems that abstracts representative operations and workload patterns, enabling automatic test generation and evaluation across multiple platforms like Hadoop, Spark, and MySQL.
Contribution
Introduces BigOP, an end-to-end big data benchmarking framework with an abstraction model for operations and workloads, supporting automatic test generation for diverse systems.
Findings
Successfully implemented an automatic test generation tool.
Benchmarking results for Hadoop, Spark, and MySQL Cluster.
Demonstrated comprehensive workload coverage across different data types.
Abstract
Big Data is considered proprietary asset of companies, organizations, and even nations. Turning big data into real treasure requires the support of big data systems. A variety of commercial and open source products have been unleashed for big data storage and processing. While big data users are facing the choice of which system best suits their needs, big data system developers are facing the question of how to evaluate their systems with regard to general big data processing needs. System benchmarking is the classic way of meeting the above demands. However, existent big data benchmarks either fail to represent the variety of big data processing requirements, or target only one specific platform, e.g. Hadoop. In this paper, with our industrial partners, we present BigOP, an end-to-end system benchmarking framework, featuring the abstraction of representative Operation sets, workload…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Parallel Computing and Optimization Techniques
