Detection of Common Subtrees with Identical Label Distribution

Romain Aza\"is; Florian Ingels

arXiv:2307.13068·cs.DS·January 5, 2024

Detection of Common Subtrees with Identical Label Distribution

Romain Aza\"is, Florian Ingels

PDF

Open Access

TL;DR

This paper introduces a novel method for detecting common subtrees with identical label distributions in tree data, addressing the complex graph isomorphism challenge with an efficient search algorithm and a new compression scheme.

Contribution

It presents a new pattern type for trees, along with an elaborated detection algorithm and a lossless compression scheme, enhancing analysis efficiency and data representation.

Findings

01

Algorithm performs well in computation time

02

Effective pattern enumeration with DAG-RW compression

03

Patterns offer more parsimonious data representation

Abstract

Frequent pattern mining is a relevant method to analyse structured data, like sequences, trees or graphs. It consists in identifying characteristic substructures of a dataset. This paper deals with a new type of patterns for tree data: common subtrees with identical label distribution. Their detection is far from obvious since the underlying isomorphism problem is graph isomorphism complete. An elaborated search algorithm is developed and analysed from both theoretical and numerical perspectives. Based on this, the enumeration of patterns is performed through a new lossless compression scheme for trees, called DAG-RW, whose complexity is investigated as well. The method shows very good properties, both in terms of computation times and analysis of real datasets from the literature. Compared to other substructures like topological subtrees and labelled subtrees for which the isomorphism…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Advanced Clustering Algorithms Research · Machine Learning and Data Classification