Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification
Sarwan Ali, Bikram Sahoo, Alexander Zelikovskiy, Pin-Yu Chen, Murray, Patterson

TL;DR
This paper introduces a benchmarking framework to evaluate the robustness of machine learning models in classifying SARS-CoV-2 genome sequences, especially under simulated sequencing errors, aiding in better model assessment and virus understanding.
Contribution
It is the first to systematically benchmark ML model robustness against simulated sequencing errors in SARS-CoV-2 genome classification.
Findings
Some simulation-based approaches are more robust and accurate.
Certain embedding methods withstand adversarial attacks better.
Benchmarking helps in understanding model behavior and virus evolution.
Abstract
The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome -- millions of sequences and counting. This amount of data, while being orders of magnitude beyond the capacity of traditional approaches to understanding the diversity, dynamics, and evolution of viruses is nonetheless a rich resource for machine learning (ML) approaches as alternatives for extracting such important information from these data. It is of hence utmost importance to design a framework for testing and benchmarking the robustness of these ML models. This paper makes the first effort (to our knowledge) to benchmark the robustness of ML models by simulating biological sequences with errors. In this paper, we introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · vaccines and immunoinformatics approaches · Machine Learning in Bioinformatics
