Mining Top-K Co-Occurrence Items

Zhi-Hong Deng

arXiv:1512.07806·cs.DB·December 25, 2015·1 cites

Mining Top-K Co-Occurrence Items

Zhi-Hong Deng

PDF

Open Access

TL;DR

This paper introduces a new task called top-k co-occurrence item mining, proposing efficient algorithms and data structures to identify the most frequently co-occurring items with a given set, outperforming baseline methods.

Contribution

The paper presents a novel mining task, Pi-Tree data structure, and two algorithms, PT and PT-TA, that significantly improve efficiency and scalability in top-k co-occurrence item mining.

Findings

01

PT algorithm outperforms baseline algorithms in execution time.

02

PT-TA with pruning further enhances efficiency.

03

Algorithms demonstrate excellent scalability on synthetic and real data.

Abstract

Frequent itemset mining has emerged as a fundamental problem in data mining and plays an important role in many data mining tasks, such as association analysis, classification, etc. In the framework of frequent itemset mining, the results are itemsets that are frequent in the whole database. However, in some applications, such recommendation systems and social networks, people are more interested in finding out the items that occur with some user-specified itemsets (query itemsets) most frequently in a database. In this paper, we address the problem by proposing a new mining task named top-k co-occurrence item mining, where k is the desired number of items to be found. Four baseline algorithms are presented first. Then, we introduce a special data structure named Pi-Tree (Prefix itemset Tree) to maintain the information of itemsets. Based on Pi-Tree, we propose two algorithms, namely PT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Rough Sets and Fuzzy Logic