Processing Database Joins over a Shared-Nothing System of Multicore   Machines

Abhirup Chakraborty

arXiv:1804.09324·cs.DB·April 26, 2018

Processing Database Joins over a Shared-Nothing System of Multicore Machines

Abhirup Chakraborty

PDF

Open Access

TL;DR

This paper presents a scalable system for distributed database joins that leverages multicore processing within each node, achieving near-linear speedup and significant intra-node performance improvements.

Contribution

It introduces a parallelized, pipelined approach to distributed joins that removes synchronization barriers and exploits multicore architectures for scalability.

Findings

01

3.5x intra-node performance gain with four threads

02

Near-linear cluster-wide speedup with increasing nodes

03

Performance dictated by intra-node computational loads

Abstract

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and communication-intensive workload---processing distributed joins. By exploiting multiple processing cores within the individual machines, we implement a system to process database joins that parallelizes computation within each node, pipelines the computation with communication, parallelizes the communication by allowing multiple simultaneous data transfers (send/receive), and removes synchronization barriers (a scalability bottleneck in a distributed data processing system). Our experimental results show that using only four threads per node the framework achieves a 3.5x gains in intra-node performance while compared with a single-threaded counterpart.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Graph Theory and Algorithms · Distributed and Parallel Computing Systems