Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search
Yaodong Yang, Yang Wang, Jinpeng Li, Pei Guo, Da Han, Guangyong Chen, Pheng-Ann Heng

TL;DR
AlphaDE is a novel framework that combines fine-tuned protein language models with Monte Carlo tree search to enhance in-silicon directed protein evolution, outperforming previous methods with minimal fine-tuning.
Contribution
It introduces a new approach integrating large language models with tree search for adaptive protein evolution, advancing beyond heuristic strategies.
Findings
AlphaDE outperforms previous state-of-the-art methods.
Effective with few-shot fine-tuning.
Supports condensing protein sequence space.
Abstract
Protein evolution through amino acid mutations is a cornerstone of life sciences. Recent advances in protein language models have shown rich evolutionary patterns, offering unprecedented potential for in-silicon directed evolution. However, existing directed evolution methods largely rely on heuristic evolution strategies and have yet to efficiently integrate the transformative protein language models with advanced optimization techniques, such as reinforcement learning, to adaptively learn superior evolution policies. To bridge this gap, we propose AlphaDE, a novel framework that evolves protein sequences by harnessing the innovative paradigms of large language models, such as fine-tuning and test-time inference. First, AlphaDE fine-tunes pretrained protein language models using masked language modeling on homologous protein sequences to activate the evolutionary plausibility of the…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- The paper is well written and clearly structured, providing both background and algorithmic details. - Combining fine-tuned PLMs with MCTS is conceptually sound and leverages recent progress in both protein modeling and search algorithms. - Strong empirical evaluation on multiple protein datasets with reproducible settings. - Demonstrates few-shot fine-tuning results, suggesting potential data efficiency.
- Limited novelty: The central idea closely overlaps with existing works on ML-guided directed evolution using protein LMs, particularly "Protein Design by Directed Evolution Guided by Large Language Models" (IEEE Transactions on Evolutionary Computation) [1], which already proposed LLM-based mutation guidance; and "LatentDE: Latent-based Directed Evolution for Protein Sequence Design" (Machine Learning: Science and Technology) [2], which introduced latent-space optimization for protein design u
* Low-N fitness prediction is important to enable. * Formalizing how RL approaches can be used in PLMs is timely and can lead to productive future works. * On the benchmarks explored, performance looks favorable, e.g. on the TEM task.
* Though this idea of RL post-training for PLMs holds promise, given the current state of the LLM field, the execution becomes quite important, and I think the paper can do better on this in terms of rigor and following through on failure cases. I personally don't think the idea itself is super novel, and I think what makes a paper like this shine would be to really help readers get intuition on how RL post-training will differ for PLMs. Even in terms of base execution, there are some decisions
The paper is well written and easy to understand. [Though I could not evaluate the experiment results because I am not from the area]
The most significant issue with the work is the lack of novelty. Most content of the work before the experiment section is known to the field. Specifically, section 3 describes the proposed method, but all the content is known to the community: the problem 3.1 is a well-known problem [1]. The masked language modeling in 3.2 is popularized by BERT [2] and is widely used in network training. The MCTS in 3.3 is popularized by AlphaGo. Therefore, I don't find the innovation in this work. I could n
- Idea to bring test-time inference like MCTS from LLMs to protein language models seems novel (I’m not super sure though, not super well contextualized). - Results (Table 1) look substantially better than other methods on most of the problems. I’m wondering if it’s a fair comparison given this, some almost seem to good to be true. I don’t personally understand why I would expect such substantial gains over various baseline methods, and over zero-shot. - Experiments compare to many methods for
- Poorly motivated — other work exists using pre-trained PLMs for protein design / directed evolution; these aren’t even cited (e.g., [1], [2], [3], [7]). Methods specifically for multi-round design, like [6], aren’t considered. - It's not really made clear why anything about AlphaDE is specific or particular to directed evolution (which is iterative) and not just general protein design. - The background section, which appears to serve as a related works section, lists some prior works that ca
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Evolutionary Algorithms and Applications · Genomics and Rare Diseases
