A Systematic Study of Joint Representation Learning on Protein Sequences and Structures
Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan,, Aur\'elie Lozano, Payel Das, Jian Tang

TL;DR
This paper systematically explores joint protein representation learning by integrating language models with structure encoders, achieving state-of-the-art results in protein function prediction.
Contribution
It introduces a comprehensive framework combining sequence and structure encoders with novel fusion strategies, advancing protein representation learning.
Findings
Significant performance improvements over existing methods
Effective fusion strategies for sequence and structure data
New state-of-the-art in protein function annotation
Abstract
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions. Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge. In contrast, structure-based methods leverage 3D structural information with graph neural networks and geometric pre-training methods show potential in function prediction tasks, but still suffers from the limited number of available structures. To bridge this gap, our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM (ESM-2) with distinct structure encoders (GVP, GearNet, CDConv). We introduce three representation fusion strategies and explore different pre-training techniques. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · Machine Learning in Materials Science
