Mining Rooted Ordered Trees under Subtree Homeomorphism
Mostafa Haghir Chehreghani, Maurice Bruynooghe

TL;DR
This paper introduces TPMiner, an efficient algorithm for mining frequent rooted ordered tree patterns under subtree homeomorphism, utilizing a novel data-structure to improve counting and performance over existing methods.
Contribution
The paper presents a new efficient algorithm and data-structure for subtree homeomorphism, significantly enhancing frequent pattern mining in tree-structured data.
Findings
TPMiner outperforms existing algorithms in efficiency.
The occ data-structure effectively encodes multiple pattern occurrences.
Experimental results show significant improvements on real-world and synthetic datasets.
Abstract
Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Data Management and Algorithms
