Controllable Protein Design with Language Models
Noelia Ferruz, Birte H\"ocker

TL;DR
This paper explores how transformer-based language models can be fine-tuned and controlled to generate novel, functional protein sequences, potentially transforming protein design and understanding folding principles.
Contribution
It introduces the concept of using NLP transformer models for controllable protein sequence generation, highlighting recent advances and future potential in the field.
Findings
Transformer models can generate diverse protein sequences.
Control tags enable specific protein function design.
Model interpretability can reveal folding principles.
Abstract
The 21st century is presenting humankind with unprecedented environmental and medical challenges. The ability to design novel proteins tailored for specific purposes could transform our ability to respond timely to these issues. Recent advances in the field of artificial intelligence are now setting the stage to make this goal achievable. Protein sequences are inherently similar to natural languages: Amino acids arrange in a multitude of combinations to form structures that carry function, the same way as letters form words and sentences that carry meaning. Therefore, it is not surprising that throughout the history of Natural Language Processing (NLP), many of its techniques have been applied to protein research problems. In the last few years, we have witnessed revolutionary breakthroughs in the field of NLP. The implementation of Transformer pre-trained models has enabled text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dropout · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Layer Normalization · Softmax
