Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and   Approximation

Yongge Wang; Xintao Wu

arXiv:1207.5466·cs.DB·July 24, 2012

Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

Yongge Wang, Xintao Wu

PDF

Open Access

TL;DR

This paper investigates the NP-completeness of approximate inverse frequent itemset mining, proposes an algorithm for it, and discusses privacy concerns related to synthetic basket data generation.

Contribution

It introduces the first analysis of NP-completeness for approximate inverse frequent itemset mining and offers an approximate algorithm along with privacy leakage assessment.

Findings

01

Approximate inverse frequent itemset mining is NP-complete.

02

An approximate algorithm for the problem is proposed and analyzed.

03

Privacy leakage in synthetic basket data sets can be assessed using the proposed method.

Abstract

In order to generate synthetic basket data sets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket data sets. The characteristics that could be used for this purpose include the frequent itemsets and association rules. The problem of generating synthetic basket data sets from frequent itemsets is generally referred to as inverse frequent itemset mining. In this paper, we show that the problem of approximate inverse frequent itemset mining is {\bf NP}-complete. Then we propose and analyze an approximate algorithm for approximate inverse frequent itemset mining, and discuss privacy issues related to the synthetic basket data set. In particular, we propose an approximate algorithm to determine the privacy leakage in a synthetic basket data set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Privacy-Preserving Technologies in Data · Imbalanced Data Classification Techniques