Characterizing BigBench queries, Hive, and Spark in multi-cloud   environments

Nicolas Poggi; Alejandro Montero; David Carrera

arXiv:2007.02800·cs.DC·July 7, 2020

Characterizing BigBench queries, Hive, and Spark in multi-cloud environments

Nicolas Poggi, Alejandro Montero, David Carrera

PDF

TL;DR

This paper benchmarks and characterizes BigBench queries and the performance of Hive and Spark in multi-cloud PaaS environments, providing insights into resource requirements, scalability, and configuration needs.

Contribution

It offers a comprehensive analysis of BigBench queries and compares Hive and Spark performance across major cloud providers, highlighting configuration impacts and resource consumption.

Findings

01

Hive and Spark have distinct resource usage patterns.

02

Performance varies significantly with data scale and configuration.

03

Proper tuning is essential for optimal performance in cloud environments.

Abstract

BigBench is the new standard (TPCx-BB) for benchmarking and testing Big Data systems. The TPCx-BB specification describes several business use cases -- queries -- which require a broad combination of data extraction techniques including SQL, Map/Reduce (M/R), user code (UDF), and Machine Learning to fulfill them. However, currently, there is no widespread knowledge of the different resource requirements and expected performance of each query, as is the case to more established benchmarks. At the same time, cloud providers currently offer convenient on-demand managed big data clusters (PaaS) with a pay-as-you-go model. In PaaS, analytical engines such as Hive and Spark come ready to use, with a general-purpose configuration and upgrade management. The study characterizes both the BigBench queries and the out-of-the-box performance of Spark and Hive versions in the cloud. At the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.