Leveraging Protein Language Model Embeddings for Catalytic Turnover Prediction of Adenylate Kinase Orthologs in a Low-Data Regime
Duncan F. Muir (1), Parker Grosjean (1), Margaux M. Pinney (1),, Michael J. Keiser (1) ((1) University of California, San Francisco)

TL;DR
This study demonstrates that protein language model embeddings, especially transformer-based learnable aggregation, improve enzyme catalytic turnover prediction for Adenylate Kinase orthologs in low-data scenarios, outperforming traditional models.
Contribution
It systematically evaluates PLM embeddings and aggregation methods for enzyme activity prediction, highlighting the effectiveness of transformer-based learnable aggregation and the limited benefit of fine-tuning.
Findings
Learned aggregation methods outperform fixed embeddings.
Transformer-based embeddings yield better $k_{cat}$ predictions.
Fine-tuning does not significantly improve performance.
Abstract
Accurate prediction of enzymatic activity from amino acid sequences could drastically accelerate enzyme engineering for applications such as bioremediation and therapeutics development. In recent years, Protein Language Model (PLM) embeddings have been increasingly leveraged as the input into sequence-to-function models. Here, we use consistently collected catalytic turnover observations for 175 orthologs of the enzyme Adenylate Kinase (ADK) as a test case to assess the use of PLMs and their embeddings in enzyme kinetic prediction tasks. In this study, we show that nonlinear probing of PLM embeddings outperforms baseline embeddings (one-hot-encoding) and the specialized (catalytic turnover number) prediction models DLKcat and CatPred. We also compared fixed and learnable aggregation of PLM embeddings for prediction and found that transformer-based learnable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Protein Structure and Dynamics · Machine Learning in Bioinformatics
MethodsSparse Evolutionary Training
