A comparative analysis of state-of-the-art SQL-on-Hadoop systems for   interactive analytics

Ashish Tapdiya; Daniel Fabbri

arXiv:1804.00224·cs.DB·April 3, 2018

A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

Ashish Tapdiya, Daniel Fabbri

PDF

1 Repo

TL;DR

This paper compares four leading SQL-on-Hadoop systems—Impala, Drill, Spark SQL, and Phoenix—evaluating their performance on benchmarks to understand their strengths, weaknesses, and execution characteristics in cloud environments.

Contribution

It provides a detailed comparative analysis of these systems' architectures, performance trade-offs, and query execution behaviors using standardized benchmarks.

Findings

01

Impala outperforms others in text format (4.41x - 6.65x)

02

Performance varies across systems and data formats

03

Execution profiles reveal bottlenecks and performance variations

Abstract

Hadoop is emerging as the primary data hub in enterprises, and SQL represents the de facto language for data analysis. This combination has led to the development of a variety of SQL-on-Hadoop systems in use today. While the various SQL-on-Hadoop systems target the same class of analytical workloads, their different architectures, design decisions and implementations impact query performance. In this work, we perform a comparative analysis of four state-of-the-art SQL-on-Hadoop systems (Impala, Drill, Spark SQL and Phoenix) using the Web Data Analytics micro benchmark and the TPC-H benchmark on the Amazon EC2 cloud platform. The TPC-H experiment results show that, although Impala outperforms other systems (4.41x - 6.65x) in the text format, trade-offs exists in the parquet format, with each system performing best on subsets of queries. A comprehensive analysis of execution profiles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meacial/e-book
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.