Discovery data topology with the closure structure. Theoretical and practical aspects
Tatiana Makhalova, Aleksey Buzmakov, Sergei O. Kuznetsov, Amedeo, Napoli

TL;DR
This paper introduces the closure structure, based on Formal Concept Analysis, as a concise way to understand the topology and complexity of binary datasets, with practical demonstrations using the GDPM algorithm.
Contribution
It formalizes the closure structure for dataset topology analysis and presents the GDPM algorithm for practical exploration of data complexity and distribution.
Findings
GDPM characterizes dataset topology in terms of complexity levels
The closure structure captures intrinsic dataset content effectively
Experiments demonstrate practical utility of GDPM in data analysis
Abstract
In this paper, we are revisiting pattern mining and especially itemset mining, which allows one to analyze binary datasets in searching for interesting and meaningful association rules and respective itemsets in an unsupervised way. While a summarization of a dataset based on a set of patterns does not provide a general and satisfying view over a dataset, we introduce a concise representation -- the closure structure -- based on closed itemsets and their minimum generators, for capturing the intrinsic content of a dataset. The closure structure allows one to understand the topology of the dataset in the whole and the inherent complexity of the data. We propose a formalization of the closure structure in terms of Formal Concept Analysis, which is well adapted to study this data topology. We present and demonstrate theoretical results, and as well, practical results using the GDPM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Advanced Database Systems and Queries
