The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems
Zhen Jia, Runlin Zhou, Chunge Zhu, Lei Wang, Wanling Gao, Yingjie Shi,, Jianfeng Zhan, Lixin Zhang

TL;DR
This paper investigates how diverse applications and scalable data sets influence the performance benchmarking of big data systems, emphasizing the need for scalable and varied data sets in benchmarks.
Contribution
It highlights the importance of including scalable data volumes and diverse applications in big data system benchmarks, supported by experimental findings.
Findings
Data scale significantly affects system performance.
Different applications show varied performance trends with data growth.
Scalable and diverse data sets are essential for effective benchmarking.
Abstract
Now we live in an era of big data, and big data applications are becoming more and more pervasive. How to benchmark data center computer systems running big data applications (in short big data systems) is a hot topic. In this paper, we focus on measuring the performance impacts of diverse applications and scalable volumes of data sets on big data systems. For four typical data analysis applications---an important class of big data applications, we find two major results through experiments: first, the data scale has a significant impact on the performance of big data systems, so we must provide scalable volumes of data sets in big data benchmarks. Second, for the four applications, even all of them use the simple algorithms, the performance trends are different with increasing data scales, and hence we must consider not only variety of data sets but also variety of applications in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · IoT and Edge/Fog Computing
