Leveraging Multi-modal Representations to Predict Protein Melting Temperatures
Daiheng Zhang, Yan Zeng, Xinyu Hong, Jinbo Xu

TL;DR
This paper develops multi-modal protein models combining sequence, structure, and function data to improve the accuracy of predicting protein melting temperatures, achieving state-of-the-art results.
Contribution
It introduces a novel approach using multiple protein language models and feature extraction methods to enhance Delta Tm prediction accuracy.
Findings
ESM-3 model achieves PCC of 0.50 on s571 dataset
Multi-modal representations outperform single-modal models
Integration of sequence, structure, and function data improves predictions
Abstract
Accurately predicting protein melting temperature changes (Delta Tm) is fundamental for assessing protein stability and guiding protein engineering. Leveraging multi-modal protein representations has shown great promise in capturing the complex relationships among protein sequences, structures, and functions. In this study, we develop models based on powerful protein language models, including ESM-2, ESM-3 and AlphaFold, using various feature extraction methods to enhance prediction accuracy. By utilizing the ESM-3 model, we achieve a new state-of-the-art performance on the s571 test dataset, obtaining a Pearson correlation coefficient (PCC) of 0.50. Furthermore, we conduct a fair evaluation to compare the performance of different protein language models in the Delta Tm prediction task. Our results demonstrate that integrating multi-modal protein representations could advance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Materials Science · Mass Spectrometry Techniques and Applications
MethodsAlphaFold
