The network footprint of replication in popular DBMSs
Muhammad Karam Shehzad, Jam Muhammad Yousif, Muhammad Saqib Ilyas, and, Adnan Iqbal

TL;DR
This paper empirically compares the network and resource overheads of replication in MySQL, PostgreSQL, and Cassandra, revealing significant traffic increases and the effects of compression on network load.
Contribution
It provides a comparative analysis of replication overheads in three popular DBMSs using real traffic data, highlighting the impact of replication and compression.
Findings
Replication traffic can be up to 300% higher with two replicas.
Compression reduces network traffic by up to 20%.
CPU and memory utilization are unaffected by replication scale.
Abstract
Database replication is an important component of reliable, disaster tolerant and highly available distributed systems. However, data replication also causes communication and processing overhead. Quantification of these overheads is crucial in choosing a suitable DBMS form several available options and capacity planning. In this paper, we present results from a comparative empirical analysis of replication activities of three commonly used DBMSs - MySQL, PostgreSQL and Cassandra under text as well as image traffic. In our experiments, the total traffic with two replicas (which is the norm) was as much as \% higher than the total traffic with no replica. Furthermore, activation of the compression option for replication traffic, built into MySQL, reduced the total network traffic by as much as \%. We also found that average CPU utilization and memory utilization were not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed systems and fault tolerance · Advanced Data Storage Technologies
