Pipelined Workflow in Hybrid MPI/Pthread runtime for External Memory   Graph Construction

Sandeep Gupta

arXiv:1210.8242·cs.DB·November 1, 2012

Pipelined Workflow in Hybrid MPI/Pthread runtime for External Memory Graph Construction

Sandeep Gupta

PDF

Open Access

TL;DR

This paper presents a pipelined hybrid MPI/Pthread approach for scalable external memory graph construction, significantly improving speed and capacity for massive graphs on SSD-enabled supercomputers.

Contribution

It introduces a novel pipelined processing scheme for external memory graph construction, enabling scalable CSR creation on supercomputers with SSDs.

Findings

01

Achieves 4-6x faster performance than existing methods.

02

Handles up to 8 billion edges using external memory.

03

Maintains efficiency at large graph sizes where prior schemes degrade.

Abstract

Graph construction from a given set of edges is a data-intensive operator that appears in social network analysis, ontology enabled databases, and, other analytics processing. The operator represents an edge list to compressed sparse row (CSR) representation (or sometimes in adjacency list, or as clustered B-Tree storage). In this work, we show how to scale CSR construction to massive scale on SSD-enabled supercomputers such as Gordon using pipelined processing. We develop several abstraction and operations for external memory and parallel edge list and integer array processing that are utilized towards building a scalable algorithm for creating CSR representation. Our experiments demonstrate that this scheme is four to six times faster than currently available implementation. Moreover, our scheme can handle up to 8 billion edges (128GB) by using external memory as compared to prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Distributed and Parallel Computing Systems · Scientific Computing and Data Management