Preference optimization of protein language models as a multi-objective   binder design paradigm

Pouria Mistani; Venkatesh Mysore

arXiv:2403.04187·physics.bio-ph·March 8, 2024·3 cites

Preference optimization of protein language models as a multi-objective binder design paradigm

Pouria Mistani, Venkatesh Mysore

PDF

Open Access

TL;DR

This paper introduces a multi-objective protein binder design method using instruction fine-tuning and preference optimization on language models, enabling targeted binder generation with improved properties.

Contribution

It presents a novel alignment strategy for autoregressive protein language models that incorporates multiple design objectives through direct preference optimization.

Findings

01

Median isoelectric point (pI) improved by 17-60%.

02

Effective design of binders conditioned on receptors and developability criteria.

03

Demonstrates the potential of language models in multi-objective protein design.

Abstract

We present a multi-objective binder design paradigm based on instruction fine-tuning and direct preference optimization (DPO) of autoregressive protein language models (pLMs). Multiple design objectives are encoded in the language model through direct optimization on expert curated preference sequence datasets comprising preferred and dispreferred distributions. We show the proposed alignment strategy enables ProtGPT2 to effectively design binders conditioned on specified receptors and a drug developability criterion. Generated binder samples demonstrate median isoelectric point (pI) improvements by $17% - 60%$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics