Mining Flipping Correlations from Large Datasets with Taxonomies
Marina Barsky, Sangkyum Kim, Tim Weninger, Jiawei Han

TL;DR
This paper introduces flipping correlation patterns that reveal surprising positive and negative correlations at different abstraction levels, along with an efficient algorithm to find such patterns in large datasets, uncovering non-redundant and actionable insights.
Contribution
The paper presents the Flipper algorithm for efficiently mining flipping correlation patterns, a novel pattern type contrasting correlations across abstraction levels.
Findings
Flipper outperforms naive methods by several orders of magnitude.
Discovered patterns are non-redundant, surprising, and actionable.
Effective in low-to-medium support itemsets where existing techniques fail.
Abstract
In this paper we introduce a new type of pattern -- a flipping correlation pattern. The flipping patterns are obtained from contrasting the correlations between items at different levels of abstraction. They represent surprising correlations, both positive and negative, which are specific for a given abstraction level, and which "flip" from positive to negative and vice versa when items are generalized to a higher level of abstraction. We design an efficient algorithm for finding flipping correlations, the Flipper algorithm, which outperforms naive pattern mining methods by several orders of magnitude. We apply Flipper to real-life datasets and show that the discovered patterns are non-redundant, surprising and actionable. Flipper finds strong contrasting correlations in itemsets with low-to-medium support, while existing techniques cannot handle the pattern discovery in this frequency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Imbalanced Data Classification Techniques
