Single-Sequence-Based Protein Secondary Structure Prediction using One-Hot and Chemical Encodings of Amino Acids
Hoa Trinh, Satish Kumar Thittamaranahalli

TL;DR
This paper introduces novel chemical encodings for amino acids that, combined with one-hot encoding, improve protein secondary structure prediction accuracy while reducing model complexity and resource requirements.
Contribution
The study presents two new chemical representations for amino acids using molecular fingerprints and FastMap, enhancing single-sequence-based secondary structure prediction models.
Findings
Chemical encodings provide additional interaction information.
Ensemble model outperforms existing LSTM-based methods.
Achieves higher accuracy with fewer parameters.
Abstract
In protein secondary structure prediction, each amino acid in sequence is typically treated as a distinct category and represented by a one-hot vector. In this study, we developed two novel chemical representations for amino acids utilizing molecular fingerprints and the dimensionality reduction algorithm FastMap. We demonstrate that the two new chemical encodings can provide additional information about the interactions of amino acids in sequences that an LSTM-based model cannot capture with one-hot encoding alone. Compared to the latest LSTM-based model used in the single-sequence-based method SPOT-1D-Single, our ensemble model utilizing one-hot and chemical encodings achieves better accuracy across most test sets while requiring approximately nine times fewer trainable parameters for each encoding model. Our single-sequence-based method is valuable for its simplicity, lower resource…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research · Genomics and Phylogenetic Studies
