Predicting protein secondary structure with Neural Machine Translation

Evan Weissburg; Ian Bulovic

arXiv:1809.09210·q-bio.QM·May 11, 2021

Predicting protein secondary structure with Neural Machine Translation

Evan Weissburg, Ian Bulovic

PDF

Open Access

TL;DR

This paper introduces a neural machine translation-based tool for protein secondary structure prediction, achieving fast, accurate results by encoding complex amino acid relationships, with a reported 65.9% Q3 accuracy.

Contribution

It adapts neural machine translation for protein structure prediction, improving accuracy and speed over existing methods.

Findings

01

Achieved 65.9% Q3 accuracy in secondary structure prediction.

02

Provided a fast prediction tool with subsecond batch processing.

03

Analyzed strengths and weaknesses of the NMT-based model.

Abstract

We present analysis of a novel tool for protein secondary structure prediction using the recently-investigated Neural Machine Translation framework. The tool provides a fast and accurate folding prediction based on primary structure with subsecond prediction time even for batched inputs. We hypothesize that Neural Machine Translation can improve upon current predictive accuracy by better encoding complex relationships between nearby but non-adjacent amino acids. We overview our modifications to the framework in order to improve accuracy on protein sequences. We report 65.9% Q3 accuracy and analyze the strengths and weaknesses of our predictive model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA and protein synthesis mechanisms · Machine Learning in Bioinformatics · Genomics and Phylogenetic Studies