Macromolecule Classification Based on the Amino-acid Sequence
Faisal Ghaffar, Sarwar Khan, Gaddisa O., Chen Yu-jhen

TL;DR
This paper applies deep learning models to classify amino acid sequences into DNA, RNA, protein, or hybrid classes, achieving nearly 99% accuracy using various neural network architectures.
Contribution
It introduces the use of NLP-inspired word embedding techniques for protein sequence classification with deep learning models.
Findings
Achieved 99% accuracy in classifying sequences
Compared CNN, LSTM, BiLSTM, and GRU architectures
Demonstrated effectiveness of NLP techniques in bioinformatics
Abstract
Deep learning is playing a vital role in every field which involves data. It has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using traditional machine learning techniques in the past. In this study we focused on classification of protein sequences with deep learning techniques. The study of amino acid sequence is vital in life sciences. We used different word embedding techniques from Natural Language processing to represent the amino acid sequence as vectors. Our main goal was to classify sequences to four group of classes, that are DNA, RNA, Protein and hybrid. After several tests we have achieved almost 99% of train and test accuracy. We have experimented on CNN, LSTM, Bidirectional LSTM, and GRU.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · Fractal and DNA sequence analysis
MethodsTest · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Gated Recurrent Unit
