Data Partitioning View of Mining Big Data
Shichao Zhang

TL;DR
This paper discusses data partitioning as a key strategy for mining big data, highlighting how dividing data into subsets enables in-memory analysis and the synthesis of global patterns from local ones.
Contribution
It revisits and emphasizes the importance of data partitioning in big data mining, presenting findings on local pattern discovery and their role in understanding global data.
Findings
Partitioning enables in-memory analysis of large datasets.
Local patterns can be synthesized to identify global patterns.
Partitioning strategies significantly impact mining effectiveness.
Abstract
There are two main approximations of mining big data in memory. One is to partition a big dataset to several subsets, so as to mine each subset in memory. By this way, global patterns can be obtained by synthesizing all local patterns discovered from these subsets. Another is the statistical sampling method. This indicates that data partitioning should be an important strategy for mining big data. This paper recalls our work on mining big data with a data partitioning and shows some interesting findings among the local patterns discovered from subsets of a dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Advanced Computational Techniques and Applications
