InstructPro: Natural Language Guided Ligand-Binding Protein Design
Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, Lei Li

TL;DR
InstructPro is a generative model that designs ligand-binding proteins guided by natural language instructions, overcoming data limitations and achieving high accuracy and generalization in protein-ligand binding predictions.
Contribution
The paper introduces InstructPro, a novel natural language-guided protein design model trained on a large-scale dataset, outperforming baselines and demonstrating strong zero-shot capabilities.
Findings
Achieves high AlphaFold3 ipTM scores and binding affinities on seen ligands.
Maintains robust zero-shot performance with substantial scores.
Scaling model size improves design quality and generalization.
Abstract
The de novo design of ligand-binding proteins with tailored functions is essential for advancing biotechnology and molecular medicine, yet existing AI approaches are limited by scarce protein-ligand complex data. To circumvent this data bottleneck, we leverage the abundant natural language descriptions characterizing protein-ligand interactions. Here, we introduce InstructPro, a family of generative models that design proteins following the guidance of natural language instructions and ligand formulas. InstructPro produces protein sequences consistent with specified function descriptions and ligand targets. To enable training and evaluation, we develop InstructProBench, a large-scale dataset of 9.6 million (function description, ligand, protein) triples. We train two model variants -- InstructPro-1B and InstructPro-3B -- that substantially outperform strong baselines. InstructPro-1B…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- This work introduces a highly intuitive method that directly translates natural language instructions into functional protein sequences. - The method's effectiveness is demonstrated with state-of-the-art results on a new, large-scale benchmark dataset, which itself is a major contribution to the field.
- The experimental results are not fully convincing. It is unclear why the evaluation computes similarity between the designed protein and the ground-truth structure. - The performance of RFdiffusion2 as a baseline is surprisingly poor, despite being a strong model validated by wet-lab experiments. - The diversity and novelty of the designed proteins are not reported, which limits understanding of the model’s generative capabilities.
1. The paper is easy to read and follow and the presentation is clear. 2. The authors build a pratical benchmark, which is beneficial for the following researches. 3. The experiments indicates the improved performance compared with baselines.
1. The success rates of InstructPro are still relatively low. Could the authors also introduce other baselines that are not based on deep learning? 2. The technical novelty is limited. InstructPro essentially combines the exsiting modules, such as PubMedbert, Roberta, and ProGen2, for a new task. Although the task is meaningful, InstructPro does not consider the inductive bias from such task. 3. The authors have tried to incorporate 3D structures by introducing EGNN but resulting worse perform
- The work proposes a rather explored paradigm: designing ligand‑binding proteins directly from natural language instructions and ligand formulas. Integrating text semantics with ligand representations is an original contribution that extends prior language‑guided protein design, which mostly ignores small‑molecule conditioning. - The authors compile InstructProBench, containing 9.59 M triples extracted from UniProt. They perform hierarchical clustering with MMSeqs2 to create non‑redundant trai
- **Low absolute success rates.** Even though InstructPro outperforms baselines, the overall success rates remain low: 2.46 % on seen ligands and 3.14 % on unseen ligands for the 1 B model. The 3 B model improves to 5.06 % and 3.93 %, but these values mean that more than 94 % of generated sequences fail to satisfy the full set of structural and functional criteria. As such, the practical usefulness of the model for generating viable candidates is questionable. - **Reliance on predicted metrics
1. **Significant Contribution of a Large-Scale Dataset:** The development of **InstructProBench**, a large-scale dataset containing nearly 9.6 million (function description, ligand, protein) triples, is a substantial contribution to the field. By curating and releasing this resource, the authors are enabling further research and providing a valuable benchmark for the community to build upon. 2. **Pioneering a Promising and Novel Research Direction:** The paper tackles the ambitious and intere
### Weaknesses: The primary weakness of this manuscript lies in its experimental design and evaluation protocol, which appears to be fundamentally misaligned with the stated goal of protein design. This raises significant concerns about the validity and interpretation of the presented results. 1. **Fundamental Issues with the Evaluation Metrics:** The core metrics chosen—RMSD to a "ground-truth structure" and the proportion of designs with RMSD < 2Å—are appropriate for *structure prediction*
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Chemical Synthesis and Analysis
MethodsALIGN
