Deep Learning Methods for Protein Family Classification on PDB Sequencing Data
Aaron Wang

TL;DR
This paper evaluates deep learning models, especially convolutional neural networks, for classifying protein families from amino acid sequences, demonstrating superior performance over classical methods using PDB data.
Contribution
The study introduces and compares novel bi-directional LSTM and convolutional models for protein classification, benchmarking against traditional machine learning approaches.
Findings
Deep learning models outperform classical methods in protein classification.
Convolutional neural networks achieve the highest inference accuracy.
Deep models show promise for automated protein function prediction.
Abstract
Composed of amino acid chains that influence how they fold and thus dictating their function and features, proteins are a class of macromolecules that play a central role in major biological processes and are required for the structure, function, and regulation of the body's tissues. Understanding protein functions is vital to the development of therapeutics and precision medicine, and hence the ability to classify proteins and their functions based on measurable features is crucial; indeed, the automatic inference of a protein's properties from its sequence of amino acids, known as its primary structure, remains an important open problem within the field of bioinformatics, especially given the recent advancements in sequencing technologies and the extensive number of known but uncategorized proteins with unknown properties. In this work, we demonstrate and compare the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Computational Drug Discovery Methods
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
