Heterogeneous Replica for Query on Cassandra

Jialin Qiao; Xiangdong Huang; Lei Rui; Jianmin Wang

arXiv:1810.01037·cs.DB·October 3, 2018

Heterogeneous Replica for Query on Cassandra

Jialin Qiao, Xiangdong Huang, Lei Rui, Jianmin Wang

PDF

Open Access

TL;DR

This paper introduces a heterogeneous replica mechanism for Cassandra that significantly improves query performance by using different serialization on disk for each replica, achieving up to 100x faster reads.

Contribution

It proposes a novel heterogeneous replica mechanism that enhances Cassandra's read performance without sacrificing write throughput or data recovery.

Findings

01

Read performance improved by up to two orders of magnitude

02

Heterogeneous replicas maintain high write throughput and data recovery

03

Effective for queries with diverse schema requirements

Abstract

Cassandra is a popular structured storage system with high-performance, scalability and high availability, and is usually used to store data that has some sortable attributes. When deploying and configuring Cassandra, it is important to design a suitable schema of column families for accelerating the target queries. However, one schema is only suitable for a part of queries, and leaves other queries with high latency. In this paper, we propose a new replica mechanism, called heterogeneous replica, to reduce the query latency greatly while ensuring high write throughput and data recovery. With this replica mechanism, different replica has the same dataset while having different serialization on disk. By implementing the heterogeneous replica mechanism on Cassandra, we show that the read performance of Cassandra can be improved by two orders of magnitude with TPC-H data set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Caching and Content Delivery