TL;DR
This study uses topological descriptors and clustering algorithms to estimate the diversity and size of RNA-like graph motifs, identifying potential new RNA structures from a large set of possible graph topologies.
Contribution
Introduces a novel computational topology-based method with machine learning to classify RNA-like graph motifs and estimate the universe of possible RNA structures.
Findings
97.3% of known RNA graphs are correctly clustered as RNA-like
Approximately 46% of hypothetical graphs are predicted to be RNA-like
Topological features distinguish RNA-like from non-RNA-like graphs
Abstract
We introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% graphs correspond to approximately 200,000 known RNA atomic fragments (collected in 2021) using the RNA-as-Graphs (RAG) mapping method. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
