Fault Tolerant Frequent Pattern Mining
Sameh Shohdy, Abhinav Vishnu, Gagan Agrawal

TL;DR
This paper introduces a novel fault-tolerant FP-Growth algorithm that uses in-memory checkpointing and advanced MPI features to ensure high efficiency and scalability in large-scale data mining tasks.
Contribution
It presents a new parallel, algorithm-level fault-tolerant FP-Growth algorithm leveraging MPI features and dataset memory for checkpointing, with efficient recovery methods.
Findings
Achieves O(1) space complexity for checkpointing.
Demonstrates 20x speed-up over Spark.
Shows efficient checkpointing and recovery on large clusters.
Abstract
FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing, though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
