A Model-Based Frequency Constraint for Mining Associations from Transaction Data
Michael Hahsler

TL;DR
This paper introduces a model-based frequency constraint using a stochastic mixture model to improve the identification of frequent itemsets in transaction data, addressing issues with traditional support thresholds.
Contribution
It proposes a novel, model-driven frequency constraint that adapts to data distribution, enhancing the robustness and interpretability of association mining.
Findings
Improved detection of relevant itemsets over traditional support thresholds
The new constraint is more robust and easier for users to interpret
Experimental results show better performance on public datasets
Abstract
Mining frequent itemsets is a popular method for finding associated items in databases. For this method, support, the co-occurrence frequency of the items which form an association, is used as the primary indicator of the associations's significance. A single user-specified support threshold is used to decided if associations should be further investigated. Support has some known problems with rare items, favors shorter itemsets and sometimes produces misleading associations. In this paper we develop a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) which allows for transaction data's typically highly skewed item frequency distribution. A user-specified precision threshold is used together with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
