DeepProteomics: Protein family classification using Shallow and Deep Networks
Anu Vazhayil, Vinayakumar R, Soman KP

TL;DR
This paper explores the use of shallow and deep neural networks, including RNN, LSTM, and GRU, to classify protein families from sequence data, achieving up to 78% accuracy, offering a computational alternative to time-consuming laboratory methods.
Contribution
It introduces a comparative analysis of shallow and deep neural network models for protein family classification using sequence data.
Findings
Achieved up to 78% accuracy in protein family classification.
Deep neural networks outperform shallow networks in this task.
Sequence-based neural models can effectively classify proteins without laboratory experiments.
Abstract
The knowledge regarding the function of proteins is necessary as it gives a clear picture of biological processes. Nevertheless, there are many protein sequences found and added to the databases but lacks functional annotation. The laboratory experiments take a considerable amount of time for annotation of the sequences. This arises the need to use computational techniques to classify proteins based on their functions. In our work, we have collected the data from Swiss-Prot containing 40433 proteins which is grouped into 30 families. We pass it to recurrent neural network(RNN), long short term memory(LSTM) and gated recurrent unit(GRU) model and compare it by applying trigram with deep neural network and shallow neural network on the same dataset. Through this approach, we could achieve maximum of around 78% accuracy for the classification of protein families.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research · Protein Structure and Dynamics
