Benchmarking DataStax Enterprise/Cassandra with HiBench
Todor Ivanov, Raik Niemann, Sead Izberovic, Marten Rosselli, Karsten, Tolle, Roberto V. Zicari

TL;DR
This paper benchmarks DataStax Enterprise/Cassandra's analytical capabilities using standard Hadoop workloads, demonstrating its ability to run Hadoop applications seamlessly through the Cassandra File System without modifications.
Contribution
It provides an evaluation of DSE's Hadoop compatibility and performance using HiBench benchmarks, highlighting its seamless integration via CFS.
Findings
DSE successfully executes Hadoop workloads without adaptation.
CFS enables seamless Hadoop application integration.
Performance results demonstrate DSE's analytical capabilities.
Abstract
This report evaluates the new analytical capabilities of DataStax Enterprise (DSE) [1] through the use of standard Hadoop workloads. In particular, we run experiments with CPU and I/O bound micro-benchmarks as well as OLAP-style analytical query workloads. The performed tests should show that DSE is capable of successfully executing Hadoop applications without the need to adapt them for the underlying Cassandra distributed storage system [2]. Due to the Cassandra File System (CFS) [3], which supports the Hadoop Distributed File System API, Hadoop stack applications should seamlessly run in DSE. The report is structured as follows: Section 2 provides a brief description of the technologies involved in our study. An overview of our used hardware and software components of the experimental environment is given in Section 3. Our benchmark methodology is defined in Section 4. The performed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Scientific Computing and Data Management
