Preference optimization of protein language models as a multi-objective binder design paradigm
Pouria Mistani, Venkatesh Mysore

TL;DR
This paper introduces a multi-objective protein binder design method using instruction fine-tuning and preference optimization on language models, enabling targeted binder generation with improved properties.
Contribution
It presents a novel alignment strategy for autoregressive protein language models that incorporates multiple design objectives through direct preference optimization.
Findings
Median isoelectric point (pI) improved by 17-60%.
Effective design of binders conditioned on receptors and developability criteria.
Demonstrates the potential of language models in multi-objective protein design.
Abstract
We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model through direct optimization on expert curated preference sequence datasets comprising preferred and dispreferred distributions. We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion. Generated binder samples demonstrate median isoelectric point (pI) improvements by .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics
