Mining Frequent Itemsets from Secondary Memory
G\"osta Grahne, Jianfei Zhu

TL;DR
This paper addresses the challenge of mining frequent itemsets from very large databases that cannot fit into main memory, proposing new disk-based algorithms that significantly reduce disk access and improve scalability.
Contribution
It introduces novel divide-and-conquer algorithms for disk-based frequent itemset mining, enhancing scalability for large datasets.
Findings
Reduced disk accesses by orders of magnitude
Enabled scalable data mining on very large databases
Proved effectiveness through experimental results
Abstract
Mining frequent itemsets is at the core of mining association rules, and is by now quite well understood algorithmically. However, most algorithms for mining frequent itemsets assume that the main memory is large enough for the data structures used in the mining, and very few efficient algorithms deal with the case when the database is very large or the minimum support is very low. Mining frequent itemsets from a very large database poses new challenges, as astronomical amounts of raw data is ubiquitously being recorded in commerce, science and government. In this paper, we discuss approaches to mining frequent itemsets when data structures are too large to fit in main memory. Several divide-and-conquer algorithms are given for mining from disks. Many novel techniques are introduced. Experimental results show that the techniques reduce the required disk accesses by orders of magnitude,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Data Management and Algorithms
