# Evaluating zero‐shot prediction of monomeric protein design success by AlphaFold, ESMFold, and ProteinMPNN

**Authors:** Mario Garcia, Sugyan M. Dixit, Gabriel J. Rocklin

PMC · DOI: 10.1002/pro.70453 · Protein Science : A Publication of the Protein Society · 2026-01-20

## TL;DR

This study evaluates how well AlphaFold, ESMFold, and ProteinMPNN can predict the success of newly designed proteins before experiments, finding moderate but limited accuracy.

## Contribution

The paper introduces a benchmark dataset of 614 de novo protein designs and evaluates the predictive power of three models for design success.

## Key findings

- All models showed moderate ability to distinguish successful from unsuccessful protein designs.
- ESMFold's pLDDT performed best among the models in predicting design success.
- Combining confidence metrics provided only modest improvements over using ESMFold pLDDT alone.

## Abstract

De novo protein design has enabled the creation of proteins with diverse functionalities that are not found in nature. Despite recent advances, experimental success rates remain inconsistent and context‐dependent, posing a bottleneck for broader applications of de novo design. To overcome this, structure and sequence prediction models have been applied to assess design quality prior to experimental testing to save time and resources. In this study, we examined the extent to which AlphaFold, Protein MPNN, and ESMFold can discriminate between experimentally successful and unsuccessful designs. We first curated a benchmark dataset of 614 experimentally characterized de novo designed monomers from 11 different design studies between 2012 and 2021. All predictive models demonstrated moderate ability to discriminate experimental successes (expressed, soluble, monomeric, and fold with the correct secondary structure) from failures. Still, many failed designs have better confidence metrics than successful designs, and confidence metrics were topology‐dependent. Among all computational models evaluated, ESMFold average predicted local‐distance difference test (pLDDT) yielded the best individual performance at distinguishing between successful and unsuccessful designs. A logistic regression model combining all confidence metrics provided only modest improvement over ESMFold pLDDT alone. Overall, these results show that these models can serve as an initial filtering strategy prior to experimental validation; however, their utility at accurately predicting experimentally successful designs remains limited without task‐specific training.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12817478/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12817478/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC12817478/full.md

---
Source: https://tomesphere.com/paper/PMC12817478