Heterogeneous Replica for Query on Cassandra
Jialin Qiao, Xiangdong Huang, Lei Rui, Jianmin Wang

TL;DR
This paper introduces a heterogeneous replica mechanism for Cassandra that significantly improves query performance by using different serialization on disk for each replica, achieving up to 100x faster reads.
Contribution
It proposes a novel heterogeneous replica mechanism that enhances Cassandra's read performance without sacrificing write throughput or data recovery.
Findings
Read performance improved by up to two orders of magnitude
Heterogeneous replicas maintain high write throughput and data recovery
Effective for queries with diverse schema requirements
Abstract
Cassandra is a popular structured storage system with high-performance, scalability and high availability, and is usually used to store data that has some sortable attributes. When deploying and configuring Cassandra, it is important to design a suitable schema of column families for accelerating the target queries. However, one schema is only suitable for a part of queries, and leaves other queries with high latency. In this paper, we propose a new replica mechanism, called heterogeneous replica, to reduce the query latency greatly while ensuring high write throughput and data recovery. With this replica mechanism, different replica has the same dataset while having different serialization on disk. By implementing the heterogeneous replica mechanism on Cassandra, we show that the read performance of Cassandra can be improved by two orders of magnitude with TPC-H data set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Cloud Computing and Resource Management · Caching and Content Delivery
