Guiding Generative Protein Language Models with Reinforcement Learning
Filippo Stocco, Maria Artigues-Lleixa, Andrea Hunklinger, Talal Widatalla, Marc Guell, Noelia Ferruz

TL;DR
This paper introduces a reinforcement learning framework to guide protein language models in designing proteins with specific desired properties, significantly improving their ability to generate high-fitness variants efficiently.
Contribution
It presents a novel method combining reinforcement learning with protein language models to steer their outputs toward user-defined objectives, enabling rapid and targeted protein design.
Findings
Achieved a 26-fold increase in EGFR binder affinity in two iterations.
Successfully guided pLMs toward various protein properties such as topology and binding affinity.
Demonstrated efficient design with few iterations through evolutionary trajectories.
Abstract
Protein language models (pLMs) have demonstrated success at generating functional proteins across vast sequence spaces but lack the ability to design high-fitness variants on demand. Here, we iteratively guide pLMs toward user-defined objectives by applying reinforcement learning (RL). We demonstrate that RL can steer pLMs toward various protein properties, such as topologies or binding affinities, in a few iterations through long evolutionary trajectories. We apply our framework to the design of epidermal growth factor receptor (EGFR) binders, achieving a 26-fold increase in binding affinity in two iterations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Bioinformatics · Natural Language Processing Techniques
