Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
Yongge Wang, Xintao Wu

TL;DR
This paper investigates the NP-completeness of approximate inverse frequent itemset mining, proposes an algorithm for it, and discusses privacy concerns related to synthetic basket data generation.
Contribution
It introduces the first analysis of NP-completeness for approximate inverse frequent itemset mining and offers an approximate algorithm along with privacy leakage assessment.
Findings
Approximate inverse frequent itemset mining is NP-complete.
An approximate algorithm for the problem is proposed and analyzed.
Privacy leakage in synthetic basket data sets can be assessed using the proposed method.
Abstract
In order to generate synthetic basket data sets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket data sets. The characteristics that could be used for this purpose include the frequent itemsets and association rules. The problem of generating synthetic basket data sets from frequent itemsets is generally referred to as inverse frequent itemset mining. In this paper, we show that the problem of approximate inverse frequent itemset mining is {\bf NP}-complete. Then we propose and analyze an approximate algorithm for approximate inverse frequent itemset mining, and discuss privacy issues related to the synthetic basket data set. In particular, we propose an approximate algorithm to determine the privacy leakage in a synthetic basket data set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Privacy-Preserving Technologies in Data · Imbalanced Data Classification Techniques
