# KmPred: prediction of Michaelis constants (Km) using an integrative machine learning framework

**Authors:** Meshari Alazmi

PMC · DOI: 10.3389/frai.2026.1711471 · Frontiers in Artificial Intelligence · 2026-01-30

## TL;DR

KmPred is a machine learning model that predicts enzyme-substrate affinity (Km) by combining protein sequences and molecular features, outperforming existing methods.

## Contribution

Introduces KmPred, a novel integrative machine learning framework for predicting Michaelis constants using sequence embeddings and molecular descriptors.

## Key findings

- KmPred outperformed the baseline MPEK model with a test R2 of 0.7049 and PCC of 0.8398.
- On the Kroll dataset, it achieved a PCC of 0.7440, comparable to state-of-the-art methods.
- Combining LSTM/Transformer features with XGBoost improved robustness and generalizability of Km predictions.

## Abstract

The Michaelis constant Km is one of the key kinetic parameters in the quantification of enzyme-substrate affinity within the context of the Michaelis–Menten theory. While Km values are traditionally subjected to labor-intensive governance via in vitro assays, the brisk expansion of protein sequence and chemical databases has composed an essential intended for computational prediction approaches.

Herein, we expose a consolidative machine learning framework-KmPred-for Km forecast that merges protein sequence embeddings from state-of-the-art language models with molecular descriptors derived from substrate SMILES descriptions. This methodology was benchmarked on the MPEK dataset and the independent dataset assembled by Kroll et al.

On the MPEK dataset, the greatest model achieved a test MSE of 0.4995, RMSE of 0.7067, MAE of 0.5022, R2 of 0.7049, and a PCC of 0.8398 (p < 1 × 10−6), outperforming the baseline MPEK model. On the Kroll dataset, KmPred achieved a test MSE of 0.6206, RMSE of 0.7878, R2 of 0.5519, PCC of 0.7440, and Spearman’s ρ of 0.7342, which represents reasonable results compared to state-of-the-art methods. These outcomes demonstrate that combining multi-modal protein sequence and ligand features with advanced machine learning architectures enables robust and generalizable Km prediction across diverse datasets. Specifically, we utilized LSTM and Transformer models solely for feature extraction to capture complex sequential and contextual patterns from enzyme sequences, while employing XGBoost as our primary regression model for final Km predictions. Beyond methodological impact, this work highlights the role of AI-driven kinetic modeling in accelerating enzyme characterization, facilitating metabolic engineering, and enhancing drug discovery pipelines. Our approach thus establishes a foundation for predictive enzymology at scale, with significant potential to benefit biotechnology, synthetic biology, and national strategic initiatives such as Saudi Vision 2030.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12901413/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12901413/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/PMC12901413/full.md

---
Source: https://tomesphere.com/paper/PMC12901413