An Efficient Data Structure for Fast Mining High Utility Itemsets

Zhi-Hong Deng; Shulei Ma; He Liu

arXiv:1510.02188·cs.DB·October 9, 2015·1 cites

An Efficient Data Structure for Fast Mining High Utility Itemsets

Zhi-Hong Deng, Shulei Ma, He Liu

PDF

Open Access

TL;DR

This paper introduces PUN-list, a new data structure that significantly speeds up high utility itemset mining by reducing utility computations and enabling effective pruning, demonstrated through extensive experiments.

Contribution

The paper presents PUN-list, a novel data structure, and MIP, a fast mining method that outperforms existing algorithms in high utility itemset mining.

Findings

01

MIP is at least ten times faster than recent algorithms.

02

PUN-list effectively reduces utility computation costs.

03

The method works efficiently on both synthetic and real datasets.

Abstract

In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Data Management and Algorithms