# Efficient Searches in Protein Sequence Space Through AI-Driven Iterative Learning

**Authors:** Ignacio Suárez-Martín, Valeria A. Risso, Rocío Romero-Zaliz, Jose M. Sanchez-Ruiz

PMC · DOI: 10.3390/ijms26104741 · International Journal of Molecular Sciences · 2025-05-15

## TL;DR

AI tools can efficiently find high-performing protein variants and predict viral evolution using only a few hundred experiments.

## Contribution

Demonstrates that AI can rapidly identify high-fitness enzyme variants and predict antibody evasion with minimal experimental data.

## Key findings

- AI models identified high-fitness enzyme variants after training on just a few hundred sequences.
- AI accurately predicted antibody evasion patterns in the SARS-CoV-2 RBD with limited data.
- The approach enables low-throughput, efficient protein engineering and viral evolution analysis.

## Abstract

The protein sequence space is vast. This fact, together with the prevalence of epistasis, hampers the engineering of novel enzymes through library screening and is a major obstacle to any attempt to predict natural protein evolution. Recently, specialized methodologies have been used to determine fitness data on ~260,000 sequences for the gene of the enzyme dihydrofolate reductase and antibody affinity data for all combinations of the mutations present in the receptor-binding domain (RBD) of the Omicron strain of SARS-CoV-2 (~30,000 variants). We show that upon iterative training on a total of just a few hundred variants, various state-of-the-art AI tools (multi-layer perceptron, random forest, and XGBoost algorithms) find very high fitness variants of the enzyme and predict the antibody evasion patterns of the RBD. This work provides a basis for efficient, widely applicable, low-throughput experimental approaches to assess viral protein evolution and to engineer enzymes for biotechnological applications.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Genes:** DHFR (dihydrofolate reductase) [NCBI Gene 1719] {aka DHFR1, DYR}
- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12112320/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12112320/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12112320/full.md

---
Source: https://tomesphere.com/paper/PMC12112320