Exploring The Potential Of GANs In Biological Sequence Analysis
Taslim Murad, Sarwan Ali, Murray Patterson

TL;DR
This paper investigates the use of Generative Adversarial Networks (GANs) to generate synthetic data for biological sequence analysis, aiming to address data imbalance issues and improve classification accuracy across multiple datasets.
Contribution
It introduces a novel GAN-based data augmentation method specifically designed for biological sequence datasets to enhance machine learning classification performance.
Findings
GANs effectively generate realistic synthetic data
Improved classification accuracy on multiple biological datasets
GAN-based augmentation outperforms traditional methods like SMOTE
Abstract
Biological sequence analysis is an essential step toward building a deeper understanding of the underlying functions, structures, and behaviors of the sequences. It can help in identifying the characteristics of the associated organisms, like viruses, etc., and building prevention mechanisms to eradicate their spread and impact, as viruses are known to cause epidemics that can become pandemics globally. New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences. However, these ML-based methods undergo challenges with data imbalance, generally associated with biological sequence datasets, which hinders their performance. Although various strategies are present to address this issue, like the SMOTE algorithm, which creates synthetic data, however, they focus on local information rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · vaccines and immunoinformatics approaches · Genomics and Phylogenetic Studies
MethodsSynthetic Minority Over-sampling Technique.
