A Comparison of Polynomial-Based Tree Clustering Methods
Pengyu Liu, Mariel V\'azquez, and Nata\v{s}a Jonoska

TL;DR
This paper compares various polynomial-based distance methods for clustering tree-structured data in biological sciences, demonstrating that normalized distances yield the best clustering accuracy.
Contribution
It introduces a systematic comparison of polynomial-based distance metrics for tree clustering and evaluates autoencoder models for this purpose.
Findings
Normalized distance-based methods outperform others in clustering accuracy
Tree polynomials enable efficient and interpretable encoding of biological tree data
Autoencoder models can be effectively used for tree clustering
Abstract
Tree structures appear in many fields of the life sciences, including phylogenetics, developmental biology and nucleic acid structures. Trees can be used to represent RNA secondary structures, which directly relate to the function of non-coding RNAs. Recent developments in sequencing technology and artificial intelligence have yielded numerous biological data that can be represented with tree structures. This requires novel methods for tree structure data analytics. Tree polynomials provide a computationally efficient, interpretable and comprehensive way to encode tree structures as matrices, which are compatible with most data analytics tools. Machine learning methods based on the Canberra distance between tree polynomials have been introduced to analyze phylogenies and nucleic acid structures. In this paper, we compare the performance of different distances in tree clustering methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Fractal and DNA sequence analysis · Bioinformatics and Genomic Networks
