Processing Database Joins over a Shared-Nothing System of Multicore Machines
Abhirup Chakraborty

TL;DR
This paper presents a scalable system for distributed database joins that leverages multicore processing within each node, achieving near-linear speedup and significant intra-node performance improvements.
Contribution
It introduces a parallelized, pipelined approach to distributed joins that removes synchronization barriers and exploits multicore architectures for scalability.
Findings
3.5x intra-node performance gain with four threads
Near-linear cluster-wide speedup with increasing nodes
Performance dictated by intra-node computational loads
Abstract
To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and communication-intensive workload---processing distributed joins. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive), and removes synchronization barriers (a scalability bottleneck in a distributed data processing system). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Graph Theory and Algorithms · Distributed and Parallel Computing Systems
