Machine Learning for Classification of Protein Helix Capping Motifs
Sean Mullane, Ruoyan Chen, Sri Vaishnavi Vemulapalli, Eli J. Draizen,, Ke Wang, Cameron Mura, Philip E. Bourne

TL;DR
This paper applies deep learning, specifically BiLSTM models, to classify protein helix capping motifs using structural data, achieving high accuracy and offering a more robust alternative to heuristic methods.
Contribution
The study introduces a deep learning approach for classifying helix capping motifs directly from structural data, improving accuracy over traditional heuristic methods.
Findings
Achieved 85% class-balanced accuracy with BiLSTM.
Compared deep learning model performance with baseline SVC.
Utilized structural features like torsion angles and physicochemical properties.
Abstract
The biological function of a protein stems from its 3-dimensional structure, which is thermodynamically determined by the energetics of interatomic forces between its amino acid building blocks (the order of amino acids, known as the sequence, defines a protein). Given the costs (time, money, human resources) of determining protein structures via experimental means such as X-ray crystallography, can we better describe and compare protein 3D structures in a robust and efficient manner, so as to gain meaningful biological insights? We begin by considering a relatively simple problem, limiting ourselves to just protein secondary structural elements. Historically, many computational methods have been devised to classify amino acid residues in a protein chain into one of several discrete secondary structures, of which the most well-characterized are the geometrically regular -helix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Genomics and Phylogenetic Studies
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM
