Detection of Common Subtrees with Identical Label Distribution
Romain Aza\"is, Florian Ingels

TL;DR
This paper introduces a novel method for detecting common subtrees with identical label distributions in tree data, addressing the complex graph isomorphism challenge with an efficient search algorithm and a new compression scheme.
Contribution
It presents a new pattern type for trees, along with an elaborated detection algorithm and a lossless compression scheme, enhancing analysis efficiency and data representation.
Findings
Algorithm performs well in computation time
Effective pattern enumeration with DAG-RW compression
Patterns offer more parsimonious data representation
Abstract
Frequent pattern mining is a relevant method to analyse structured data, like sequences, trees or graphs. It consists in identifying characteristic substructures of a dataset. This paper deals with a new type of patterns for tree data: common subtrees with identical label distribution. Their detection is far from obvious since the underlying isomorphism problem is graph isomorphism complete. An elaborated search algorithm is developed and analysed from both theoretical and numerical perspectives. Based on this, the enumeration of patterns is performed through a new lossless compression scheme for trees, called DAG-RW, whose complexity is investigated as well. The method shows very good properties, both in terms of computation times and analysis of real datasets from the literature. Compared to other substructures like topological subtrees and labelled subtrees for which the isomorphism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Advanced Clustering Algorithms Research · Machine Learning and Data Classification
