DIMSpan - Transactional Frequent Subgraph Mining with Distributed   In-Memory Dataflow Systems

Andr\'e Petermann; Martin Junghanns; Erhard Rahm

arXiv:1703.01910·cs.DB·March 7, 2017·5 cites

DIMSpan - Transactional Frequent Subgraph Mining with Distributed In-Memory Dataflow Systems

Andr\'e Petermann, Martin Junghanns, Erhard Rahm

PDF

Open Access 2 Repos

TL;DR

DIMSpan is a scalable distributed in-memory system for frequent subgraph mining in large graph collections, leveraging dataflow systems like Spark and Flink to improve performance and reduce network traffic.

Contribution

It introduces a novel distributed approach for subgraph mining that is optimized for runtime, memory-awareness, and minimal network traffic in big data environments.

Findings

01

Demonstrates high scalability on large graph datasets

02

Effective pruning and optimization techniques improve performance

03

Outperforms existing solutions in distributed environments

Abstract

Transactional frequent subgraph mining identifies frequent subgraphs in a collection of graphs. This research problem has wide applicability and increasingly requires higher scalability over single machine solutions to address the needs of Big Data use cases. We introduce DIMSpan, an advanced approach to frequent subgraph mining that utilizes the features provided by distributed in-memory dataflow systems such as Apache Spark or Apache Flink. It determines the complete set of frequent subgraphs from arbitrary string-labeled directed multigraphs as they occur in social, business and knowledge networks. DIMSpan is optimized to runtime and minimal network traffic but memory-aware. An extensive performance evaluation on large graph collections shows the scalability of DIMSpan and the effectiveness of its pruning and optimization techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Graph Theory and Algorithms · Advanced Database Systems and Queries