A Misclassification Network-Based Method for Comparative Genomic   Analysis

Wan He; Tina Eliassi-Rad; Samuel V. Scarpino

arXiv:2412.07051·q-bio.GN·January 17, 2025

A Misclassification Network-Based Method for Comparative Genomic Analysis

Wan He, Tina Eliassi-Rad, Samuel V. Scarpino

PDF

Open Access 1 Repo

TL;DR

This paper introduces GMNA, a novel AI and network science-based framework for classifying genomes that improves understanding of misclassification drivers, demonstrated on SARS-CoV-2 data with implications for global health analysis.

Contribution

The study presents GMNA, a new flexible, network-based genomic classification method that incorporates misclassified instances and learned scoring to enhance interpretability and accuracy.

Findings

01

Successfully classified SARS-CoV-2 genomes by sampling location.

02

Revealed insights into how human mobility influences viral spread.

03

Demonstrated the framework's effectiveness with multiple AI models.

Abstract

Classifying genome sequences based on metadata has been an active area of research in comparative genomics for decades with many important applications across the life sciences. Established methods for classifying genomes can be broadly grouped into sequence alignment-based and alignment-free models. Conventional alignment-based models rely on genome similarity measures calculated based on local sequence alignments or consistent ordering among sequences. However, such methods are computationally expensive when dealing with large ensembles of even moderately sized genomes. In contrast, alignment-free (AF) approaches measure genome similarity based on summary statistics in an unsupervised setting and are efficient enough to analyze large datasets. However, both alignment-based and AF methods typically assume fixed scoring rubrics that lack the flexibility to assign varying importance to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wanhe13/Genome-Misclassification-Network-Analysis-GMNA-
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification