GRADOOP: Scalable Graph Data Management and Analytics with Hadoop
Martin Junghanns, Andr\'e Petermann, Kevin G\'omez, Erhard Rahm

TL;DR
Gradoop is a scalable, flexible framework built on Hadoop for managing and analyzing large, schema-free graph data using high-level operators and a domain-specific language.
Contribution
It introduces an end-to-end graph analytics approach on Hadoop with a new data model, operators, and a domain-specific language for complex graph analysis.
Findings
Successfully used for business intelligence analysis
Applied to social network data analysis
Supports multiple graphs and rich data semantics
Abstract
Many Big Data applications in business and science require the management and analysis of huge amounts of graph data. Previous approaches for graph analytics such as graph databases and parallel graph processing systems (e.g., Pregel) either lack sufficient scalability or flexibility and expressiveness. We are therefore developing a new end-to-end approach for graph data management and analysis based on the Hadoop ecosystem, called Gradoop (Graph analytics on Hadoop). Gradoop is designed around the so-called Extended Property Graph Data Model (EPGM) supporting semantically rich, schema-free graph data within many distinct graphs. A set of high-level operators is provided for analyzing both single graphs and collections of graphs. Based on these operators, we propose a domain-specific language to define analytical workflows. The Gradoop graph store is currently utilizing HBase for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Database Systems and Queries · Advanced Graph Neural Networks
