Multi-Scale Representation Learning for Protein Fitness Prediction
Zuobai Zhang, Pascal Notin, Yining Huang, Aur\'elie Lozano, Vijil, Chenthamarakshan, Debora Marks, Payel Das, Jian Tang

TL;DR
This paper introduces S3F, a multimodal protein representation model combining sequence, structure, and surface features, achieving state-of-the-art fitness prediction and offering new insights into protein function.
Contribution
The paper presents the S3F model, integrating multiple protein modalities across scales, to improve fitness prediction beyond existing sequence- or structure-only models.
Findings
Achieves state-of-the-art results on ProteinGym benchmark
Effectively leverages sequence, structure, and surface features
Provides insights into protein functional determinants
Abstract
Designing novel functional proteins crucially depends on accurately modeling their fitness landscape. Given the limited availability of functional annotations from wet-lab experiments, previous methods have primarily relied on self-supervised models trained on vast, unlabeled protein sequence or structure datasets. While initial protein representation learning studies solely focused on either sequence or structural features, recent hybrid architectures have sought to merge these modalities to harness their respective strengths. However, these sequence-structure models have so far achieved only incremental improvements when compared to the leading sequence-only approaches, highlighting unresolved challenges effectively leveraging these modalities together. Moreover, the function of certain proteins is highly dependent on the granular aspects of their surface topology, which have been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
