Probabilistic thermal stability prediction through sparsity promoting transformer representation
Yevgen Zainchkovskyy, Jesper Ferkinghoff-Borg, Anja Bennett, Thomas, Egebjerg, Nikolai Lorenzen, Per Jr. Greisen, S{\o}ren Hauberg, Carsten, Stahlhut

TL;DR
This paper enhances protein property prediction by combining sparsity-promoting transformer models with a probabilistic framework, achieving more accurate and robust melting temperature predictions for drug design.
Contribution
It introduces a sparsity-promoting approach for transformer models and advocates for probabilistic modeling in ML-driven drug design tasks.
Findings
Mean absolute error of 0.23°C in Tm prediction
Sparsity improves robustness and accuracy
Probabilistic framing emphasizes uncertainty quantification
Abstract
Pre-trained protein language models have demonstrated significant applicability in different protein engineering task. A general usage of these pre-trained transformer models latent representation is to use a mean pool across residue positions to reduce the feature dimensions to further downstream tasks such as predicting bio-physics properties or other functional behaviours. In this paper we provide a two-fold contribution to machine learning (ML) driven drug design. Firstly, we demonstrate the power of sparsity by promoting penalization of pre-trained transformer models to secure more robust and accurate melting temperature (Tm) prediction of single-chain variable fragments with a mean absolute error of 0.23C. Secondly, we demonstrate the power of framing our prediction problem in a probabilistic framework. Specifically, we advocate for the need of adopting probabilistic frameworks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Materials Science · Computational Drug Discovery Methods
