Kermut: Composite kernel regression for protein variant effects
Peter M{\o}rch Groth, Mads Herbert Kerrn, Lars Olsen, Jesper, Salomon, Wouter Boomsma

TL;DR
Kermut is a Gaussian process regression model with a novel composite kernel that predicts protein variant effects accurately and provides meaningful uncertainty estimates, advancing protein engineering and biological understanding.
Contribution
It introduces a new composite kernel for Gaussian process regression that achieves state-of-the-art accuracy and uncertainty estimation in protein variant effect prediction.
Findings
State-of-the-art prediction accuracy achieved.
Provides meaningful overall uncertainty calibration.
Instance-specific uncertainty calibration remains challenging.
Abstract
Reliable prediction of protein variant effects is crucial for both protein optimization and for advancing biological understanding. For practical use in protein engineering, it is important that we can also provide reliable uncertainty estimates for our predictions, and while prediction accuracy has seen much progress in recent years, uncertainty metrics are rarely reported. We here provide a Gaussian process regression model, Kermut, with a novel composite kernel for modeling mutation similarity, which obtains state-of-the-art performance for supervised protein variant effect prediction while also offering estimates of uncertainty through its posterior. An analysis of the quality of the uncertainty estimates demonstrates that our model provides meaningful levels of overall calibration, but that instance-specific uncertainty calibration remains more challenging.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Computational Drug Discovery Methods
MethodsGaussian Process
