# Machine learning for enzyme catalytic activity: current progress and future horizons

**Authors:** Sizhe Qiu, Haris Saeed, Will Leonard, Feiran Li, Aidong Yang

PMC · DOI: 10.1093/bib/bbag002 · Briefings in Bioinformatics · 2026-01-25

## TL;DR

This paper reviews how machine learning is being used to predict and optimize enzyme activity for industrial applications, highlighting key strategies and future directions.

## Contribution

The paper identifies useful ML strategies like attention mechanisms and transfer learning for enzyme catalytic activity prediction.

## Key findings

- Attention mechanisms, new features like product information, and transfer learning are effective ML strategies for enzyme modeling.
- Dataset imbalance is a limitation in enzyme catalytic activity prediction that needs addressing.
- Accurate ML predictors could transform enzyme engineering and biocatalysis optimization.

## Abstract

Enzyme catalysis, with its advantages in environmental sustainability and efficiency, is gaining traction across diverse industrial applications, such as waste utilization and pharmaceutical biomanufacturing. However, optimizing enzyme catalytic activity remains a significant challenge. To facilitate enzyme mining and engineering, machine learning (ML) models have emerged to predict enzyme substrate specificity, enzyme turnover number, and enzyme catalytic optimum. This review endeavored to assist researchers in effectively utilizing predictive models for enzyme catalytic activity through presenting recent advancements and analyzing different approaches. We also pointed out existing limitations (e.g. dataset imbalance) and offered suggestions on potential enhancements to address them. We identified that the attention mechanism, inclusion of new features such as product information and temperature, and using transfer learning to leverage different datasets were three main useful modeling strategies. Furthermore, we envisaged that accurate predictors of enzyme catalytic activity would potentially transform enzyme and metabolic engineering, and the optimization of biocatalysis.

## Full-text entities

- **Diseases:** ML (MESH:D007859)
- **Chemicals:** amino acid (MESH:D000596), sugar (MESH:D000073893), CatOpt (-), carbon dioxide (MESH:D002245)
- **Species:** Rhodotorula glutinis (species) [taxon 5535], Armillaria solidipes (species) [taxon 1076256], Escherichia coli K-12 (strain) [taxon 83333], Buttiauxella sp. (species) [taxon 1972222]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12832030/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12832030/full.md

## References

130 references — full list in the complete paper: https://tomesphere.com/paper/PMC12832030/full.md

---
Source: https://tomesphere.com/paper/PMC12832030