Exploring the Protein Sequence Space with Global Generative Models
Sergio Romero-Romero, Sebastian Lindner, Noelia Ferruz

TL;DR
This paper reviews the use of global generative models, especially language models, for exploring the protein sequence space, including design, structure prediction, and directed evolution applications.
Contribution
It provides a comprehensive overview of recent advances in protein generative models, highlighting the role of language models and non-Transformer architectures in protein research.
Findings
Language models enable novel protein design and structure prediction.
Recent models achieve unprecedented performance in protein generation.
Applications include directed evolution and artificial protein creation.
Abstract
Recent advancements in specialized large-scale architectures for training image and language have profoundly impacted the field of computer vision and natural language processing (NLP). Language models, such as the recent ChatGPT and GPT4 have demonstrated exceptional capabilities in processing, translating, and generating human languages. These breakthroughs have also been reflected in protein research, leading to the rapid development of numerous new methods in a short time, with unprecedented performance. Language models, in particular, have seen widespread use in protein research, as they have been utilized to embed proteins, generate novel ones, and predict tertiary structures. In this book chapter, we provide an overview of the use of protein generative models, reviewing 1) language models for the design of novel artificial proteins, 2) works that use non-Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research · Protein Structure and Dynamics
