Bala-Join: An Adaptive Hash Join for Balancing Communication and Computation in Geo-Distributed SQL Databases
Wenlong Song, Hui Li, Bingying Zhai, Jinxin Yang, Pinghui Wang, Luming Sun, Ming Li, Jiangtao Cui

TL;DR
Bala-Join is an adaptive hash join algorithm designed for geo-distributed SQL databases that optimally balances communication and computation, especially under skewed data workloads, significantly improving throughput.
Contribution
It introduces Bala-Join, combining BPPR, a skew-aware redistribution method, with real-time skew detection and synchronization mechanisms for better distributed join performance.
Findings
Increases throughput by 25%-61% over existing solutions
Effectively handles skewed data in geo-distributed environments
Reduces network overhead while balancing computational load
Abstract
Shared-nothing geo-distributed SQL databases, such as CockroachDB, are increasingly vital for enterprise applications requiring data resilience and locality. However, we encountered significant performance degradation at the customer side, especially when their deployments span multiple data centers over a Wide Area Network (WAN). Our investigation identifies the bottleneck in the performance of the Distributed Hash Join (Dist-HJ) algorithm, which is contingent upon a crucial balance between communication overhead and computational load. This balance is severely disrupted when processing skewed data from real-world customer workloads, leading to the observed performance decline. To tackle this challenge, we introduce Bala-Join, an adaptive solution to balance the computation and network load in Dist-HJ execution. Our approach consists of the Balanced Partition and Partial Replication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Distributed systems and fault tolerance
