Leveraging Protein Language Model Embeddings for Catalytic Turnover   Prediction of Adenylate Kinase Orthologs in a Low-Data Regime

Duncan F. Muir (1); Parker Grosjean (1); Margaux M. Pinney (1),; Michael J. Keiser (1) ((1) University of California; San Francisco)

arXiv:2505.03066·q-bio.QM·May 7, 2025

Leveraging Protein Language Model Embeddings for Catalytic Turnover Prediction of Adenylate Kinase Orthologs in a Low-Data Regime

Duncan F. Muir (1), Parker Grosjean (1), Margaux M. Pinney (1),, Michael J. Keiser (1) ((1) University of California, San Francisco)

PDF

Open Access 1 Repo

TL;DR

This study demonstrates that protein language model embeddings, especially transformer-based learnable aggregation, improve enzyme catalytic turnover prediction for Adenylate Kinase orthologs in low-data scenarios, outperforming traditional models.

Contribution

It systematically evaluates PLM embeddings and aggregation methods for enzyme activity prediction, highlighting the effectiveness of transformer-based learnable aggregation and the limited benefit of fine-tuning.

Findings

01

Learned aggregation methods outperform fixed embeddings.

02

Transformer-based embeddings yield better $k_{cat}$ predictions.

03

Fine-tuning does not significantly improve performance.

Abstract

Accurate prediction of enzymatic activity from amino acid sequences could drastically accelerate enzyme engineering for applications such as bioremediation and therapeutics development. In recent years, Protein Language Model (PLM) embeddings have been increasingly leveraged as the input into sequence-to-function models. Here, we use consistently collected catalytic turnover observations for 175 orthologs of the enzyme Adenylate Kinase (ADK) as a test case to assess the use of PLMs and their embeddings in enzyme kinetic prediction tasks. In this study, we show that nonlinear probing of PLM embeddings outperforms baseline embeddings (one-hot-encoding) and the specialized $k_{c a t}$ (catalytic turnover number) prediction models DLKcat and CatPred. We also compared fixed and learnable aggregation of PLM embeddings for $k_{c a t}$ prediction and found that transformer-based learnable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

keiserlab/face-plm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Protein Structure and Dynamics · Machine Learning in Bioinformatics

MethodsSparse Evolutionary Training