Mining Frequent Itemsets from Secondary Memory

G\"osta Grahne; Jianfei Zhu

arXiv:cs/0405069·cs.DB·August 16, 2016

Mining Frequent Itemsets from Secondary Memory

G\"osta Grahne, Jianfei Zhu

PDF

Open Access

TL;DR

This paper addresses the challenge of mining frequent itemsets from very large databases that cannot fit into main memory, proposing new disk-based algorithms that significantly reduce disk access and improve scalability.

Contribution

It introduces novel divide-and-conquer algorithms for disk-based frequent itemset mining, enhancing scalability for large datasets.

Findings

01

Reduced disk accesses by orders of magnitude

02

Enabled scalable data mining on very large databases

03

Proved effectiveness through experimental results

Abstract

Mining frequent itemsets is at the core of mining association rules, and is by now quite well understood algorithmically. However, most algorithms for mining frequent itemsets assume that the main memory is large enough for the data structures used in the mining, and very few efficient algorithms deal with the case when the database is very large or the minimum support is very low. Mining frequent itemsets from a very large database poses new challenges, as astronomical amounts of raw data is ubiquitously being recorded in commerce, science and government. In this paper, we discuss approaches to mining frequent itemsets when data structures are too large to fit in main memory. Several divide-and-conquer algorithms are given for mining from disks. Many novel techniques are introduced. Experimental results show that the techniques reduce the required disk accesses by orders of magnitude,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Data Management and Algorithms