SARS-CoV-2 virus RNA sequence classification and geographical analysis with convolutional neural networks approach
Selcuk Yazar

TL;DR
This study employs convolutional neural networks to classify SARS-CoV-2 RNA sequences by geographical origin with high accuracy, aiding in understanding virus spread and evolution.
Contribution
It introduces a novel approach of transforming RNA sequences into images for CNN-based classification and applies it to phylogenetic analysis of Turkish virus variants.
Findings
Achieved 98% AUC in classifying sequences by continent
Successfully used CNNs for phylogenetic analysis of Turkish variants
Compared results with GISAID gene alignment data
Abstract
Covid-19 infection, which spread to the whole world in December 2019 and is still active, caused more than 250 thousand deaths in the world today. Researches on this subject have been focused on analyzing the genetic structure of the virus, developing vaccines, the course of the disease, and its source. In this study, RNA sequences belonging to the SARS-CoV-2 virus are transformed into gene motifs with two basic image processing algorithms and classified with the convolutional neural network (CNN) models. The CNN models achieved an average of 98% Area Under Curve(AUC) value was achieved in RNA sequences classified as Asia, Europe, America, and Oceania. The resulting artificial neural network model was used for phylogenetic analysis of the variant of the virus isolated in Turkey. The classification results reached were compared with gene alignment values in the GISAID database, where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · vaccines and immunoinformatics approaches · SARS-CoV-2 and COVID-19 Research
