KinForm: Kinetics Informed Feature Optimised Representation Models for Enzyme $k_{cat}$ and $K_{M}$ Prediction
Saleh Alwer, Ronan Fleming

TL;DR
KinForm is a machine learning framework that enhances enzyme kinetic parameter prediction by optimizing protein feature representations through multi-layer embeddings, dimensionality reduction, and data rebalancing, leading to improved accuracy and generalisation.
Contribution
KinForm introduces a novel combination of multi-layer residue embeddings, weighted pooling, PCA, and oversampling to improve enzyme kinetics prediction beyond existing methods.
Findings
KinForm outperforms baseline models on benchmark datasets.
Improvements are especially significant for low sequence similarity proteins.
Binding-site pooling, PCA, and oversampling contribute to enhanced performance.
Abstract
Kinetic parameters such as the turnover number () and Michaelis constant () are essential for modelling enzymatic activity but experimental data remains limited in scale and diversity. Previous methods for predicting enzyme kinetics typically use mean-pooled residue embeddings from a single protein language model to represent the protein. We present KinForm, a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters by optimising protein feature representations. KinForm combines several residue-level embeddings (Evolutionary Scale Modeling Cambrian, Evolutionary Scale Modeling 2, and ProtT5-XL-UniRef50), taken from empirically selected intermediate transformer layers and applies weighted pooling based on per-residue binding-site probability. To counter the resulting high dimensionality, we apply dimensionality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
