Application of Sequence Embedding in Protein Sequence-Based Predictions
Nabil Ibtehaz, Daisuke Kihara

TL;DR
This paper reviews recent advances in applying sequence embedding techniques from NLP to protein sequences, highlighting their use in contact prediction, secondary structure, and function prediction.
Contribution
It provides a comprehensive overview of various protein sequence embedding methods and their applications in bioinformatics.
Findings
Sequence embeddings improve prediction accuracy
Embedding methods enable new insights into protein structure
Various embedding approaches are reviewed and compared
Abstract
In sequence-based predictions, conventionally an input sequence is represented by a multiple sequence alignment (MSA) or a representation derived from MSA, such as a position-specific scoring matrix. Recently, inspired by the development in natural language processing, several applications of sequence embedding have been observed. Here, we review different approaches of protein sequence embeddings and their applications including protein contact prediction, secondary structure, prediction, and function prediction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Bioinformatics and Genomic Networks
