An Efficient Data Structure for Fast Mining High Utility Itemsets
Zhi-Hong Deng, Shulei Ma, He Liu

TL;DR
This paper introduces PUN-list, a new data structure that significantly speeds up high utility itemset mining by reducing utility computations and enabling effective pruning, demonstrated through extensive experiments.
Contribution
The paper presents PUN-list, a novel data structure, and MIP, a fast mining method that outperforms existing algorithms in high utility itemset mining.
Findings
MIP is at least ten times faster than recent algorithms.
PUN-list effectively reduces utility computation costs.
The method works efficiently on both synthetic and real datasets.
Abstract
In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Data Management and Algorithms
