An Investigation in Optimal Encoding of Protein Primary Sequence for Structure Prediction by Artificial Neural Networks
Aaron Hein, Casey Cole, Homayoun Valafar

TL;DR
This study systematically evaluates various input encodings, window sizes, and neural network architectures to optimize protein structure prediction, achieving significant improvements in dihedral angle accuracy.
Contribution
It introduces a comprehensive evaluation of 2,541 permutations of encoding and architecture, identifying optimal configurations for protein structure prediction.
Findings
One-hot encoding with LSTMs and window sizes of 9, 11, 15 are optimal.
Predicted phi dihedrals within 14-16 degrees, psi within 23-25 degrees.
Achieved notable accuracy improvements over previous methods.
Abstract
Machine learning and the use of neural networks has increased precipitously over the past few years primarily due to the ever-increasing accessibility to data and the growth of computation power. It has become increasingly easy to harness the power of machine learning for predictive tasks. Protein structure prediction is one area where neural networks are becoming increasingly popular and successful. Although very powerful, the use of ANN require selection of most appropriate input/output encoding, architecture, and class to produce the optimal results. In this investigation we have explored and evaluated the effect of several conventional and newly proposed input encodings and selected an optimal architecture. We considered 11 variations of input encoding, 11 alternative window sizes, and 7 different architectures. In total, we evaluated 2,541 permutations in application to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research · vaccines and immunoinformatics approaches
