Supervised Learning of Protein Melting Temperature: Cross‐Species vs. Species‐Specific Prediction
Sebastián García López, Jesper Salomon, Wouter Boomsma

TL;DR
This paper shows that models predicting protein melting temperatures perform worse than expected when applied across species, despite high correlation scores in cross-species data.
Contribution
The study reveals that cross-species training for melting temperature prediction does not improve performance and highlights the limitations of current models.
Findings
Spearman rho scores over cross-species data overestimate model performance.
Cross-species training does not benefit melting temperature prediction.
Species-specific models outperform cross-species approaches.
Abstract
Protein melting temperatures are important proxies for stability, and frequently probed in protein engineering campaigns, for instance for enzyme discovery and protein optimization. With the emergence of large datasets of melting temperatures for diverse natural proteins, it has become possible to train models to predict this quantity, and the literature has reported impressive performance values in terms of Spearman rho. The high correlation scores suggest that it should be possible to accurately predict melting temperature changes in engineered variants, and to reliably identify naturally thermostable proteins. However, in practice, results in these settings are often disappointing. In this paper, we explore this apparent discrepancy. We show that Spearman rho over cross‐species data gives an overly optimistic impression of prediction performance, and that this metric reflects the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · RNA and protein synthesis mechanisms · Machine Learning in Bioinformatics
