Predicting protein secondary structure with Neural Machine Translation
Evan Weissburg, Ian Bulovic

TL;DR
This paper introduces a neural machine translation-based tool for protein secondary structure prediction, achieving fast, accurate results by encoding complex amino acid relationships, with a reported 65.9% Q3 accuracy.
Contribution
It adapts neural machine translation for protein structure prediction, improving accuracy and speed over existing methods.
Findings
Achieved 65.9% Q3 accuracy in secondary structure prediction.
Provided a fast prediction tool with subsecond batch processing.
Analyzed strengths and weaknesses of the NMT-based model.
Abstract
We present analysis of a novel tool for protein secondary structure prediction using the recently-investigated Neural Machine Translation framework. The tool provides a fast and accurate folding prediction based on primary structure with subsecond prediction time even for batched inputs. We hypothesize that Neural Machine Translation can improve upon current predictive accuracy by better encoding complex relationships between nearby but non-adjacent amino acids. We overview our modifications to the framework in order to improve accuracy on protein sequences. We report 65.9% Q3 accuracy and analyze the strengths and weaknesses of our predictive model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Machine Learning in Bioinformatics · Genomics and Phylogenetic Studies
