A Misclassification Network-Based Method for Comparative Genomic Analysis
Wan He, Tina Eliassi-Rad, Samuel V. Scarpino

TL;DR
This paper introduces GMNA, a novel AI and network science-based framework for classifying genomes that improves understanding of misclassification drivers, demonstrated on SARS-CoV-2 data with implications for global health analysis.
Contribution
The study presents GMNA, a new flexible, network-based genomic classification method that incorporates misclassified instances and learned scoring to enhance interpretability and accuracy.
Findings
Successfully classified SARS-CoV-2 genomes by sampling location.
Revealed insights into how human mobility influences viral spread.
Demonstrated the framework's effectiveness with multiple AI models.
Abstract
Classifying genome sequences based on metadata has been an active area of research in comparative genomics for decades with many important applications across the life sciences. Established methods for classifying genomes can be broadly grouped into sequence alignment-based and alignment-free models. Conventional alignment-based models rely on genome similarity measures calculated based on local sequence alignments or consistent ordering among sequences. However, such methods are computationally expensive when dealing with large ensembles of even moderately sized genomes. In contrast, alignment-free (AF) approaches measure genome similarity based on summary statistics in an unsupervised setting and are efficient enough to analyze large datasets. However, both alignment-based and AF methods typically assume fixed scoring rubrics that lack the flexibility to assign varying importance to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification
