Grid-based Approaches for Distributed Data Mining Applications

Lamine M. Aouad; Nhien-An Le-Khac; Tahar Kechadi

arXiv:1703.09807·cs.DB·March 30, 2017·1 cites

Grid-based Approaches for Distributed Data Mining Applications

Lamine M. Aouad, Nhien-An Le-Khac, Tahar Kechadi

PDF

Open Access

TL;DR

This paper introduces grid-based methods for distributed clustering and frequent itemset generation, evaluating their performance on an experimental grid system and analyzing the challenges in achieving realistic performance expectations.

Contribution

It presents new distributed data mining algorithms tailored for grid environments and provides a performance evaluation with comparison to an analytical model.

Findings

01

Distributed algorithms are well-adapted for grid environments

02

Performance on grid systems shows significant overheads

03

Realistic performance expectations are challenging to achieve

Abstract

The data mining field is an important source of large-scale applications and datasets which are getting more and more common. In this paper, we present grid-based approaches for two basic data mining applications, and a performance evaluation on an experimental grid environment that provides interesting monitoring capabilities and configuration tools. We propose a new distributed clustering approach and a distributed frequent itemsets generation well-adapted for grid environments. Performance evaluation is done using the Condor system and its workflow manager DAGMan. We also compare this performance analysis to a simple analytical model to evaluate the overheads related to the workflow engine and the underlying grid system. This will specifically show that realistic performance expectations are currently difficult to achieve on the grid.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Advanced Database Systems and Queries