Data Partitioning View of Mining Big Data

Shichao Zhang

arXiv:1611.09691·cs.DB·November 30, 2016·2 cites

Data Partitioning View of Mining Big Data

Shichao Zhang

PDF

Open Access

TL;DR

This paper discusses data partitioning as a key strategy for mining big data, highlighting how dividing data into subsets enables in-memory analysis and the synthesis of global patterns from local ones.

Contribution

It revisits and emphasizes the importance of data partitioning in big data mining, presenting findings on local pattern discovery and their role in understanding global data.

Findings

01

Partitioning enables in-memory analysis of large datasets.

02

Local patterns can be synthesized to identify global patterns.

03

Partitioning strategies significantly impact mining effectiveness.

Abstract

There are two main approximations of mining big data in memory. One is to partition a big dataset to several subsets, so as to mine each subset in memory. By this way, global patterns can be obtained by synthesizing all local patterns discovered from these subsets. Another is the statistical sampling method. This indicates that data partitioning should be an important strategy for mining big data. This paper recalls our work on mining big data with a data partitioning and shows some interesting findings among the local patterns discovered from subsets of a dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Advanced Computational Techniques and Applications